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14EC770 : ASIC DESIGN 


Preamble 
14EC270 : Digital Logic Circuit Design 
14ЕС520 : Digital CMOS Systems 


Objective 
This course provide the students, the knowledge about 
— Physical design flow of IC 
e FHloor-planning, Placement and Routing 

— Experiments explore complete digital design flow of 
programmable ASIC through VLSI EDA tools. 

— Students work from design entry using verilog code to GDSII file 
generation of an ASIC. 


Concept MAP 


ASIC DESIGN 


Consists of 
deals with 


Done for 


ASIC Types and Construction 1 | | System Partitioning 2 Floorplanning and Placement 3 | | Routing and Circuit Extraction 4 Ша my 


Considered for Includes Includes Includes includes 


i Digital design flow 5.1 
“=e ie ey 4 Measurement of Partitioning 2.1 | | Floor Planning Measurement 3.1 | | Global Routing Measurement 4.1 Design and Simultion 5.2 
ypes o ' Constructive Partitioning 2.2 1/0, Power and clock planning 3.2 Global routing 4.2 Synthesis and Analysis 5.3 


bide ет pd Iterative partitioning 2.3 Measurement of Placement 3.3 | | Detailed routing Measurement 4.3 | | Floorplanning & Placement 5.4 
: T e ие Kernighan-Lin algorithm 2.4 Placement Algorithms 3.4 Detailed routing algorithms 4.4 Routing, Circuit Extraction 
TUN UNT NNNM ы; FPGA Partitioning 2.5 Eigen value placement 3.5 Circuit Extraction 4.5 and optimization 5,5 


ASIC Construction 1.6 Power Dissipation 2,6 


Iterative Placement algorithms 3.6 Layout Design rules 4.6 GDS-II File generation 5.6 


Course Outcomes 


technologies of an ASIC and its construction. 

Describe the goals, objectives, measurements and Apply 
algorithms of partitioning then apply those algorithms to 

partition the network to meet the objectives. 

Describe the goals, objectives, measurements and Apply 
algorithms of floorplanning & placement then apply those 

algorithms to place the logic cells inside the flexible blocks 

of an ASIC to meet the objectives. 


Describe the goals, objectives, measurements and Analyze 
algorithms of routing then apply those algorithms to route 

the channels then describing various circuit extraction 

formats and Investigate the issues and discover solutions in 

each step of physical design flow of an ASIC. 

Design an ASIC for digital circuits with ASIC design flow Analyze 
steps consists of simulation, synthesis, floorplanning, 

placement, routing, circuit extraction and generate GDSII 

File for fabrication of an ASIC, then analyze the ASIC to 

meet the performance in terms of area, speed and power 

using EDA tools. 7 
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Integrated Circuit 


** Wafer : A circular piece of pure silicon (10-15 cm in dia, 
wafers of 30 cm dia are expected soon) 


% Wafer Lot: 5 ~ 30 wafers, each containing hundreds of 
chips(dies) depending upon size of the die 


* Die: A rectangular piece of silicon that contains one 
IC design 


% Mask Layers: Each ІС is manufactured with successive 
mask layers(10 — 15 layers) 


* First half-dozen or so layers define transistors 
* Other half-dozen define Interconnect 


% 


oo 


+ 


% 
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Integrated Circuit (IC) in a package 


0.1 inch 


(b) 


(a) A pin-grid array (PGA) package. 


(b) The silicon die or chip is under the package lid. 


Evolution of IC 


SSI (Small-Scale Integration)-(1962) 
— Tens of Transistors 
— NAND,NOR 


MSI (Medium-Scale Integration)-(late 1960) 
— Hundreds of Transistors 
— Counters 


LSI (Large-Scale Integration)-(mid 1970) 
— Tens of Thousands of Transistors 
— First Microprocessor 


VLSI (Very Large-Scale Integration)-(1980) 
— started Hundreds of Thousands of Transistors-several billion transistors in 2009 
— 64 bit Microprocessor with cache memory and floating-point arithmetic units 


ULSI (Ultra Large-Scale Integration)-(late 1980) 
— More than about one million circuit elements on a single chip. 
— The Intel 486 and Pentium microprocessors, use ULSI technology 


IC technologies 
Bipolar 


— More accuracy 


MOS 
— Gate-Aluminium 
— Low power consumption 


— Low cost 


CMOS 
— Gate-Poly-Silicon 
— Low power consumption 


— Low cost 


BiCMOS 


Types of IC 


Standard ICs 


Glue Logic-Microelectronic system design then 
becomes a matter of defining the functions that 
you can implement using standard ICs and then 
implementing the remaining logic functions 
(sometimes called glue logic ) with one or more 
custom ICs. 


ASIC 


ASSP (Application-Specific Standard Products) 


ASIC and Non ASIC 


* Examples of ICs that are not ASICs include standard parts such as: 


— memory chips sold as a commodity item—ROMs, DRAM, and SRAM; 
microprocessors; 


— TTL or TTL-equivalent ICs at SSI, MSI, and LSI levels. 


* Examples of ICs that are ASICs include: 
— achip for a toy bear that talks; 
— achip for a satellite; 


— achip designed to handle the interface between memory and a 
microprocessor for a workstation CPU; 


— achip containing a microprocessor as a cell together with other logic. 


* ASSP (two ICs that might or might not be considered ASICs ) 
— controller chip for a PC and a chip for a modem. 
— Both of these examples are specific to an application (shades of an ASIC) 
but are sold to many different system vendors (shades of a standard part). 


ASICs such as these are sometimes called application-specific standardi4 
products ( ASSPs ). 


Measurement of IC 
* Gate Equivalent 


— Number of gates or transistors 

— Gate refer to two input NAND Gate 

— In CMOS, each NAND gate consist of 4 transistors 
— Example : 10k gate IC 


— (10,000 two-input NAND gates or 40,000 transistors in 
CMOS) 


е Feature Size (smallest feature size =/ ) 
— Half of smallest transistor length 
— Example: 0.5um IC 
— Feature size, 4 = 0.25um 


Types of ASICs — Conta 


Full-Custom ASICs 
Stnadard-Cell 
based ASICs 


Semi-Custom 
ASICs 
Gate-Array based 
ASICs 


Praogrammable 
ASICs 


* Full-Custom ASICs: Possibly all logic cells and all mask layers customized 


• Semi-Custom ASICs: ап logic cells are pre-designed and some (possibly all) 
mask layers customized 
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L]Full-Custom ASICs 


** Include some (possibly all) customized logic cells 
** Have all their mask layers customized 


% Manufacturing lead time is typically 8 weeks (time taken to make the ІС 
does not include design time) 


** Full-custom ASIC design makes sense only 


Y When no suitable existing libraries exist or 
Y Existing library cells are not fast enough or 


Y The available pre-designed/pre-tested cells consume too much power 


that a | design can allow or 
Y The available logic cells are not compact enough to fit or 


Y ASIC technology is new or/and so special that no cell library exits. a 


Types of ASICs — Conta 


L]Full-Custom ASICs 


% Advantages: 
«Offer highest performance 
* lowest cost (smallest die size) 


«Disadvantages 
** Increased design time 
** Increased Complexity 
** Higher design cost 
** Higher risk. 


** Some Examples: 
% Microporcessor, 
«High-Voltage Automobile Control Chips 
** Ana-Digi Communication Chips 
% Sensors and Actuators 
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Types of ASICs — солға 


LI Semi-Custom ASICs 
% Standard-Cell based ASICs (CBIC- “sea-bick”) 


vV Use predesigned logic cells (Called standard cells) from 
Y standard cell libraries 
Y other mega-cells (Microcontroller or Microprocessors) 
Y full-custom blocks 
Y System-Level Macros(SLMs) 
v Functional Standard Blocks (FSBs) 
Y cores etc 
v Get all mask layers customized- transistors and interconnect 
Y Manufacturing lead time is about 8 weeks 
Y Custom blocks can be embedded 
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Types of ASICs - coca 


LJ Semi-Custom ASICs - солға eee ee 


4» Standard-Cell based ASICs dd 
(CBIC- “sea-bick”) — conta 


expanded view 
of part of flexible 


no connection 
талий, connection metal2 
to power to power 
pads metal1 pads 
VSS VDD 


metal1 rows of standard cells u 


€ Routing a CBIC (cell-based IC) 
> A "wall" of standard cells forms a flexible block 


+ metal2 may be used in a feedthrough cell to cross over cell rows that use 
metal for wiring 


* Other wiring cells: spacer cells, row-end cells, and power cells 
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1 ypes of ASICS — conca 
LJ Standard Cell in Flexible block of CBIC 


e Std cell in library is constructed using full-custom design 
methodology- 
— Same performance and flexibility but reduce time and risk. 
e ASIC designer defines only placement of standard cells 


— Itcan be placed anywhere on silicon. 


LJ Construction of Flexible blocks in CBIC 


— Standard cells are designed like bricks in a wall. 
— Groups of standard cells fit horizontally to form rows. 
- The rows stack vertically to form flexible blocks- reshape during design 


Flexible blocks connected with other std cell blocks or full custom block 


| 
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Types of ASICs — сова 


О Wiring cells in Standard Cell based ASICs 
*Feedthrough cell: 


*Piece of metal that is used to pass a signal through a cell or to a space in a cell waiting 
to be used as a feedthrough 


«Spacer cells 
“Тһе width of each row of standard cells 15 adjusted so that they may be aligned 
using spacer cells . 


*Row end cells 
*The power buses, or rails, are then connected to additional vertical power rails 
using row-end cells at the aligned ends of each standard-cell block. 


* Power cells 

*If the rows of standard cells are long, then vertical power rails can also be run in 
metal2 through the cell rows using special power cells that just connect to VDD 
and GND. 

*Usually the designer manually controls the number and width of the vertsgal 
power rails connected to the standard-cell blocks during physical design. 


Types of ASICs — coca 


О Advantages of CBIC 

— Save time, money, reduce risk 

— Standard cell optimized individually for speed or area 
Q Disadvantages of CBIC: 

— Time to design standard cell library 

— Expenses of designing std cell library 


— Time needed to fabricate all layers of the ASIC for new design 
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Types of ASICs — conta 


О Semi-Custom ASICs — Cont’d 
% Gate Array based ASICs 


** Transistors are predefined on the silicon wafer 
** Predefined pattern of transistors on a gate array is base array. 
* Smallest element repeated to form base array is base cell. 


** Only the top few layers of metal, which define the interconnect between 
transistors, are defined by the designer using custom masks.It is often called a 
masked gate array ( MGA ). 
** Less turnaroundtime: fewdays or couple of weeks. 

A gate array, masked gate array, MGA, or prediffused array uses 

macros (books) to reduce turnaround time and comprises a base 

array made from a base cell or primitive cell. There are three types: 


+ Channeled gate arrays 
% Channelless gate arrays 
+ Structured gate arrays 
A channeled gate array 
“+ Only the interconnect is customized 
4$ The interconnect uses predefined 
spaces between rows of base cells 
$% Manufacturing lead time is between two 
days and two weeks 
Similar to CBIC —but here space is fixed 


Types of ASICs — Conta 


Ы Semi-Custom ASICs — Cont’d 
** Chanelless Gate Array ASIC 


* Achannelless gate array (channel- 
free gate array, sea-of-gates array, 
or SOG array) 

* Only some (the top few) mask layers 
are customized — the interconnect 

+ Manufacturing lead time is between 
two days and two weeks. 


“Тһе key difference between a channelless gate array and channeled 
gate array 
*there are no predefined areas set aside for routing between cells on 
a channelless gate array. 
*Use an area of transistors for routing in a channelless array, we do 
not make any contacts to the devices lying underneath; we simply 
leave the transistors unused. 
“Тһе logic density —the amount of logic that can be implemented 
in a given silicon area is higher for channelless gate array 
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Types of ASICs — Conta 


L] Semi-Custom ASICs — Cont’d 
* Difference between Channeled and Chanelless Gate Array ASIC 


econtact mask is customized in a channelless gate array, 
but is not usually customized in a channeled gate array. This 


leads to denser cells in the channelless architectures. 


«Customizing the contact layer in a channelless gate array 
allows us to increase the density of gate-array cells because 


we can route over the top of unused contact sites. 
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Types of ASICs — Conta 


О Semi-Custom ASICs — Cont’d 
% Structured Gate Array based ASICs 


• An embedded gate array or 
structured gate array (masterslice 
or masterimage) 

* Only the interconnect is customized 
% Custom blocks (the same for each 
design) can be embedded 


% Manufacturing lead time is between 
two days and two weeks. 


embedded 
block 


*An embedded gate array or structured gate array (also known as masterslice or 
masterimage ) combines some of the features of CBICs and MGAs. 

*One of the disadvantages of the MGA is the fixed gate-array base cell. This 
makes the implementation of memory, for example, difficult and inefficient. 

* [n an embedded gate array we set aside some of the IC area and dedicate it to 
a specific function. 

* This embedded area either can contain a different base cell that is more 
suitable for building memory cells, or it can contain a complete circuit block, 


such as a microcontroller. 
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Channelled gate array 

Adv: Specific space for interconnection 

Disadv: compared to CBIC space is not adjustable 
Channelless gate array 

Adv : 

* Logic density is higher for channelless gate array 
* Contact layers are customized 

Disadv: 

e No specific area for routing 

* Rows of transistors used for routing are not used for other purpose. 
Structured Gate Array 

Adv: 


* Embedded gate array set in some of IC area and dedicate to specific 
function-customized. 


* Increase area efficiency, performance of CBIC 

* low cost and fast turn around of MGA 

Disadv: 28 
Embedded function 15 fixed 


Types of ASICs — Conta 


О Semi-Custom ASICs - Cont'd 
% Programmable ASICs 


Y PLDs - PLDs аге low-density devices 
which contain 1k — 10 k gates and are 
available both in bipolar and CMOS 
technologies [PLA, PAL or GAL] 


Y CPLDs or FPLDs or FPGAs - 


FPGAs combine architecture of gate arrays 
with programmability of PLDs. 


Y User Configurable 
Y Contain Regular Structures - 


circuit elements such as AND, OR, 
NAND/NOR gates, FFs, Mux, RAMs, 


Y Allow Different Programming 
Technologies 

v Allow both Matrix and Row- 
based Architectures 


Logic gates 
and 
programmable 


Inputs 
(logic variables) 


Outputs 
(logic functions) 


switches 
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Programmable Logic Devices 


Programmable logic devices ( PLDs ) are standard ICs 


— A Available in standard configurations 

—  Soldin very high volume to many different customers. 

PLDs may be configured or programmed to create a part customized to a specific 
application 


—  PLDs use different technologies to allow programming of the device. еман 


The important features of PLDs: 


— No customized mask layers or logic cells rogramm able 
interconnect 


—  Fastdesign turnaround 


Structure of programmable logic device (PLD) 


— А single large block of programmable interconnect 
A matrix of logic macrocells that usually consist of programmable array logic 


followed by a flip-flop or latch 
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Examples of PLD 


The simplest type of programmable IC is a read-only memory ( ROM ). 


The most common types of ROM use a metal fuse that can be blown 
permanently 


(a programmable ROM or PROM ). 


An eraseable programmable ROM (EPROM), uses programmable MOS 
transistors whose characteristics are altered by applying a high voltage. 


Erasable PROM 


— Erase an EPROM either by using another high voltage (an electrically erasable 
PROM , or EEPROM ) 


— Exposing the device to ultraviolet light ( UV-erasable PROM , or UVPROM ). 


There is another type of ROM that can be placed on any ASIC—a 
maskprogrammable ROM (mask-programmed ROM or masked КОМ). 


— A masked ROM is a regular array of transistors permanently programmed using 
custom mask patterns. 


3] 
An embedded masked ROM 1s thus a large, specialized, logic cell. 


PROM 


Use a PROM to implement an: 
* inverter F] = А = 
“ОН F2= A+B 
+ МАМО F3=A-B 


РгодгаттаЫе ОВ + ХОВ Е = АФВ 


А B C 


Truth table is transferred ла - ою 
directly to the PROM grid. ap. 


/ 


Dots indicate connections to AND gate inputs Үз Y2 Y1 Yo 


PROM, it is not fast enough. 
Occupies more space. 
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Jta 


Decoder 
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Type of PLDs-PLA and PAL 


Place a logic array as a cell on a custom 
ASIC. This type of logic array is called a 
programmable logic array (PLA). 


A PLA has a programmable AND logic 
array, or AND plane , followed by a 
programmable OR logic array, or OR 
plane 


A PAL has a programmable AND plane 
and, in contrast to a PLA, a fixed OR 
plane. 


Depending on how the PLD is 
programmed, we can have an 
- Erasable PLD (EPLD), 


- Mask-programmed PLD (called as 


masked PLD but usually just PLD). 


The first bipolar based PALs, PLAs, and 
PLDs used programmable fuses or links. 


CMOS PLDs usually employ floating-gate 
transistors 


Logic gates 


gic ga 
and 


programmable 
switches 


Inputs utputs 
(logic variables) (logic functions) 
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Programmable Logic Devices 


PLA PLA Design Example 
B 


C 


PAL 


CLK OE 


FIGURE 7.19 
Basic macrocell logic 


АНГ» plane 


e Suitable to implement 


sequential logic. » 


* Very fast. 


Types of ASICs — Conta 


О Semi-Custom ASICs - Cont'd 
% Programmable ASICs 


Y PLDs - PLDs аге low-density devices 
which contain 1k — 10 k gates and are 
available both in bipolar and CMOS 
technologies [PLA, PAL or GAL] 


Y CPLDs or FPLDs or FPGAs - 


FPGAs combine architecture of gate arrays 
with programmability of PLDs. 


Y User Configurable 
Y Contain Regular Structures - 


circuit elements such as AND, OR, 
NAND/NOR gates, FFs, Mux, RAMs, 


Y Allow Different Programming 
Technologies 

v Allow both Matrix and Row- 
based Architectures 


Logic gates 
and 
programmable 


Inputs 
(logic variables) 


Outputs 
(logic functions) 


switches 
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ЏО block 


= 
2 
© 
© 


Types of ASICs — cota 


М Semi-Custom ASICs — Cont’d 


* Programmable ASICs - Cont'd 
% Structure of a CPLD / FPGA 


PAL-like PAL-like 
block 


PAL-like PAL-like 
block block 


kr 
2 
ис) 
т 
= 


(b) Pin grid array (PGA) package (bottom view) 


49014 ОЛ 
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Essential characteristics of FPGA 


Core-regular array of Programmable basic logic cells implement 


combinational or sequential logic 

Matrix of programmable interconnects surround the basic logic cells 
Programmable I/O cells surround the core 

A method of programming the basic logic cells and interconnect 
None of the mask layers are customized 

Design turnaround is few hours. 


Difference between PLD and FPGA: 


— FPGA are larger and more complex than PLD 
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Why FPGA-based ASIC Design? 


О Choice is based on Many 
Requirement FPGA/FPLD Discrete Logic Custom Logic 
Factors ; speed 


ха Speed Gate Density 


* Gate Density 


Cost 


* Development Time 


Development Time 


* Prototyping and Simulation 
Time Prototyping and Sim. 


УС 


% 


Manufacturing Lead Time  manutacturing 


2% 


% 


Future Modifications 


Future Modification 


УС 


% 


Inventory Risk 
Cost 


Inventory 


% 


% 


Е ШЕВ 


Development Tools 


Very Effective 


im EERE 
| Шын c 


EN 
© 


Different Categorizations of FPGAs 


LJ Based on Functional Unit/Logic 
Cell Structure 
** Transistor Pairs 
** Basic Logic Gates: NAND/NOR 
* MUX 
** Look —up Tables (LUT) 
% Wide-Fan-In AND-OR Gates 
Ы Programming Technology 
** Anti-Fuse Technology 
* SRAM Technology 
* EPROM Technology 
L] Gate Density 
ГІ Chip Architecture (Routing Style) 


О Logic block Е Interconnection switches 


I/O block 


4] 


Different Types of Logic Cells 


FPGA Architecture: Functional Units 
Functional units 


= RAM blocks (Xilinx): Actress 
i : lire 
implement function truth table (input) 


= Multiplexers {Actel}: 
buid Boolean functions using 
muxes 


— Logic gates, flip-flops- 
Such as carry chains. Used for 
high-performance 
computations 


autput 
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Different Types of Logic Cells — Cont'd 


LI Actel Act Logic Module Structure 


C-Module S-Mbdule (ACT 2) S-Mbdule (ACT 3) 
** Use Antifuse Programming Tech. poo poo гоо SE 
К от а 
% Based оп Channeled GA Architecture ot ot ot 
2 3 2 m А ^1 Al А 
% Logic Cell is MUX which can be configured as multi-input logic gates ғ E Bt Bl 8i 
80 50 in R 80 50 
CLR 
CLK CLK 
(а) ib) (с) 
: | SE i "m 
Logic Module Logic Module Logic Module = — SE 
D D а-а 
D а ок C2 
D n 5 
4 1 с т 
ЗЕ CLR CLR 
C В СІВ а аа y 
б кд trtopmac Л 
H 1 7 
А 9 (e 
т » The Actel ACT 2 and ACT 3 Logic Modules. (a) The C- 
B Module for combinational logic. (b) The ACT 2 S-Module. 


(c) The ACT 3 S-Module. (d) The equivalent circuit 
(without buffering) of the SE (sequential element). (e) The 
b} ic} idi sequential element configured as a positive-edge—triggered 
D flip-flop. (Source: Actel.) 


Ре (А-В «(E C) D 
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four control Ines per CLE for internal 
control or SR AMcontrol 


i = programm able MUX 
to/fiom adjacent CLB 


(c) Storage cell contents in the LUT 


44 


Different 


Ы Altera Flex / Max Logic 


Element Structure 


* Flex 8k/10k Devices - SRAM Based LUTS, Logic 
Elements (LEs) are similar to those used in XC5200 
FPGA 


itera FLEX 


local | CRVO CASCO 
interconnect :  ———Ó—————————— ТЕ 


Logic Arra 
Block is a. 


b) 


per LAB 


es of Logic Cells — Cont'd 


b) 
LAB 
[оде ary Block) 


16 
macrocells 
per LAB 


chipwide 
interconnect 


macrocell 1 
macrocell 2 


3 chek, dear, i 
igi ew а enable i 
> ; 
i macrocell 
bs i output 
parallel expander i 


to next macrocell i 
macrocell Redback | : 


ether 
| macrocells 


in LAB 


The Altera MAX architecture. (a) Organization of logic and 
interconnect. (b) A MAX family LAB (Logic Array Block). 
(c) A MAX family macrocell. The macrocell details vary 
between the MAX families—the functions shown here are 
closest to those of the MAX 9000 family 
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Different Types of Logic Cells - Cont'd 


To SUMMARIZE, FPGAS from various 


vendors differ in their 


4» Architecture (Row Based or Matrix 
Based Routing Mechanism) 


“» Gate Density (Cap. In Equiv. 2- Input 
NAND Gates) 


** Basic Cell Structure 


* Programming Technology 


Vendor Product | |Architechture |Capacity |Basic Cell [Programming Technology | 


tee С —]H248k бак _ 
Matrix — ( 
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Programming Technologies 


ГІ Three Programming Technologies 
* The Antifuse Technology 
«Static RAM Technology 
EPROM and EEPROM Technology 


Агт аже 


Бета Есеп 


EPROM 


EEPROM 
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Antifuse 


r artifice 


diffusi Р percentage 
т antilise эпішев TEM antitize 100 
antiise polysilicon ОМО dielectric link polysilicon antiiza polysilicon 
ov idenibide—ox ide 
t (ONO) dielectric 
ntacte 
r antifuse diffusion п 20 nm ть antifuse d 
p 1 diffusion 11 in = 
24 24 e = ы ЛЕ 
әгіішзе resistance / d 
(a) ib) (4 
[a] [b] 
percentage 
100 
0 
A e E NE E 
amomhous Si amorphous Si Lo S antitise resistance / 0 
[c] [d] 
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Programming Technologies - Cont'd 


о The Antifuse Technology 
«Invented at Stanford and developed 
by Actel 
* Opposite to regular fuse Technology 


* Normally an open circuit until a 
programming current (about 5 mA) is | 
forced through it dum. ша Prem 


antiUse | antifjse 100 


antibise polysilicon ONO dielectric link polysilicon мейе polysilicon 
% 
5%» Two Types T — oxide-nitide-oxide 
i (ONO) dielectric 
У ? nv antifuse diffusion <10га % nh Т» antifuse мегі 0 
** Actel's PLICE 


LL diffusion п 8 8 8 8 8 
2h 2h 2% o ^ Е e x 
[[Programmable Low-Impedance Circuit (a (9 9 тт 
Element]- 
** A High-Resistance Poly-Diffusion 
Antifuse [A] Actel Antifuse cross section 
ONO layer offers high resistance : Е А 
«For 5mA, resistance is 5000 [b] Link in Actel Antifuse 
* Programming time-5-10 Minutes [c] Actel Antifuse in metal contacts 
ету; - : К 
* Disadvantages: [d] Actel Antifuse Resistance 
** It doesn't allow large amount of 
current 


** Need extra space to connect witnh 
metal layer- add parasitic 
capacitance.-Unwanted Long Delay. 
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Programming Technologies - Cont'd 


**QuickLogic’s Low-Resistance 
metal-metal antifuse [ViaLink] 
technology 
* Direct metal-2-metal 
connections 
*Higher programming currents 
reduce antifuse resistance 


* For 15mA, resistance is 80 О 


Advantage: 2 amorphous Si amorphous ы 21 ЕНЕ а d anos 
** No parasitic capacitance- 
reduce delay [A] QuickLogic Antifuse — two level 
Disadvantages of Antifuse metal 
technology: [b] QuickLogic Antifuse — three level 
Unwanted RC Delay metal 
ОТР Technology [c] QuickLogic Antifuse in metal contacts 


* Less reliable- electromigration 


[d] QL Antifuse Resistance 


* Need separate programming box- 
Activator. 
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Programming Technologies - Cont'd 


О Static RAM Technology 


< SRAM cells are used for 


Y As Look-Up Tables (LUT) to 
implement logic (as Truth Tables) 


Y As embedded RAM blocks (for 
buffer storage etc.) 


Y Two cross coupled inverters and a д 

standard CMOS process configuration 
Y The configuration cell make or break | contro 
connection BEAD or d 

v As control to routing and WE ATE 


configuration switches 


** Advantages 


Y Allows In-System Programming 
(ISP) 


Y Suitable for Reconfigurable HW 


** Disadvantages 


Y Volatile — needs power all the time / 
use PROM to download configuration 
data 


Y Larger in size than Antifuse 


Programming Technologies - Cont'd 


(ГІ EPROM and EEPROM Technology-(Altera & Xilinx) 


GND no channel 


(а) ib) ic 
< EPROM Cell is almost as small as Antifuse 


** Floating-Gate Avalanche MOS (FAMOS) Tech. 


Y Under normal voltage, transistor is on 


Y With Programming Voltage applied, we can turn it off (configuration) to 
implement our logic 


Y Exposure to UV lamp (one hour) we can erase the programming 


Y Use EEPROM for quick reconfiguration, also, ISP possible 
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Programming Technologies - Cont'd 


О Summary Sheet 


Programmable ASIC technologies 


Actel Xilinx LEA! 
Programming Poly-diffusian Erasable S RAM 
technology antifuse, PLACE ISP 
Size of Small but requires Two inverters plus 


programming contacts to metal pass and switch 


element devices. Largest. 
Process Special: CMOS Standard CMOS 
plus three extra 
masks. 
Program- Special hardware PO card, PROM, 


ming method or Serial port 


GuickLogic Crosspoint 
Programming Metal-rmetal Meatal-pohrsilinan 
technology —— antifuse, ViaLink — antifuse 
Size of Smallest Small 


programming 


element 

Process Special, CMOS Special, CMOS 
plus Үз ілік plus antifuse 

Program- Special hardware Special hardware 

ming method 


Altera EPLO 
UV «erasabhe 
EPROM (MAX 5k) 
EEPROM (MAX 
ӨК) 

One n-channel 
EPROM device. 
Medium. 


Standard EPROM 
and EEPROM 


ISP (MAX Sk) or 
EPROM program- 
mar 

Atmel 
Erasable SAAM. 
ISP. 


Two Inverters plus 
pass and switch 
devices. Largest. 


Standard CMOS 


РС card, PROM, 
nr serial port 


Xilinx ЕРІП 


UV- erasable 
EPROM 


One n-channel 
EPROM device. 
Medium. 
Standard EPROM 


EPROM program- 


таг 


Altera FLEX 
Erasable SARAM. 
ISP 


Two inverters plus 
pass ard switch 
devices. Largest. 


Standard CMOS 


PG card, PROM. 
or serial port 


'Lucent (formerhy ATAT) FPGAs have almost identical properties to the Xilinx LCA family 


53 


ASIC Design Process 


S-1 Design Entry: Schematic entry 
or HDL description 

S-2: Logic Synthesis: Using 
Verilog HDL or VHDL and 
Synthesis tool, produce a netlist- 
logic cells and their interconnect 
detail 


S-3 System Partitioning: Divide a 
large system into ASIC sized pieces 


S-4 Pre-Layout Simulation: Check 
design functionality 


S-5 Floorplanning: Arrange netlist 
blocks on the chip 


S-6 Placement: Fix cell locations in 
a block 


S-7 Routing: Make the cell and 
block interconnections 


S-8 Extraction: Measure the 
interconnect R/C cost 


S-9 Post-Layout Simulation 


= 


Шак 
НЕ 


back-annotated 
netlist 


prelayout 


am ulation o 


postay out 
sim ulation o 


creuit 
estraction o 


sat | 


9 
Ө 
toop anning 
o 
9 


ч VHDL^^krilag 


a 
= 
go шр 


дұр 


logical 
design 
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ASIC Design Flow 


1.Design entry. Enter the design into an ASIC design system, either 
using a hardware description language ( HDL ) or 
schematic entry . 
2. Logic synthesis. Use an HDL (VHDL or Verilog) and a logic synthesis 
tool to produce a netlist —a description of the logic 
cells and their connections. 
3. System partitioning. Divide a large system into ASIC-sized pieces. 


4. Prelayout simulation. Check to see if the design functions correctly. 


5. Floorplanning. Arrange the blocks of the netlist on the chip. 

6. Placement. Decide the locations of cells in a block. 

7. Routing. Make the connections between cells and blocks. 

8. Extraction. Determine the resistance and capacitance of the 
interconnect. 


9. Postlayout simulation. Check to see the design still works with the added loads 
of the interconnect. 
Steps 1—4 are part of logical design , and steps 5—9 are part of physical design . 


e There is some overlap. For example, system partitioning might be considered as either logical or 
physical design. To put it another way, when we are performing system partitioning we have to 55 
consider both logical and physical factors. 


ASIC Design Process — Cont’d 


L] Altera FPGA Design Flow — A Self-Contained System that does all 
from Design Entry, Simulation, Synthesis, and Programming of Altera Devices 


Р 


Timing Annotated Metlists in 
Verilog HDL, VHDL, EDIF, SDF, VITAL 


EDIF/VHDL 


MAX*PLUS 11 
Design Entry Place & Route 


Synthesis 


*EDIF-Electronic Design Interchange 
Format 
*SDF-Standard Delay Format 


Pre- & Post-Lay out Simulation 
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ASIC Design Process — Cont'd 


О Xilinx FPGA Design Flow — Allows Third-Party Design Entry SW, 


Accepts their generated netlist file as an input 


m2_1 
+o Xilinx cell | 
11 


E 


start 


— thE 
mu —A design entry 
simulation 
netlist o © 
= - о with unit Ө 
nonnnn . delays z Y = 
mann (... toxnf) 
zu irs - T— 
[LT 


L- is 


A 
| 


m ЕЯ 


NF | netlist without 
netlist| delays 


Y 
(С xmake ) Partition logic 
M— L—— into CLBs 


Y 
.LCA 
netlist 
7—5, 
(ppr/apr ) Place and 
1.12ns ~ route 
4-1%. ү  back-annotated 
[t .XNF | netlist with delays postlayout 
4 а! netlist >! simulation 
ҒА z e 
ІҢ =} Ж. E 
———— create 
(та kebit 5) programming nnnnnnn 
file 1 ГГ. 
Y px Lr 
| BIT | 10011000... — 
Xilinx | file 
software l 


ӨӨ ФО "eo 


*XNF-Xilinx Netlist 
Format 


eLCF-Library Container 
File 
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FPGA vs. ASIC 


Advantages: Advantages 
Faster time-to-market Cost....cost....cost....Lower unit costs 
No NRE (Non Recurring EXSenses) Speed...speed...speed....ASICs are faster than 
Simpler design cycle 


More predictable project сусіё 

Field Re-programability 

Reusability 

Prototyping SN 4 
A) 


Unlike ASICs, FPGA's have special hap % 
such as Block-RAM, DCM modules/MAC 


memories and high speed 1/0, ernbedded SP қ 
с 


Low power....Low power....Low power 


ASIC you can implement analog circuit, mixed 
ғ signal designs. This is generally not possible in 


FPGA. 

In ASIC DFT (Design For Test) is inserted. In FPGA 
DFT is not carried out (rather for FPGA no need 
of DFT !). 


etc inbuilt 


FPGA synthesis is much mgfe easjér thap/ASI 
onger Time-to-market 


igh NRE 
Design Issues such as DFM, SI 


In FPGA you don't have all these because ASIC 


Р porn n Son iri designer takes care of all these. ( Don't forget 
ower consumpBon in НИВА Ts more. You dont FPGA is an IC and designed by ASIC design 


have any control over the power optimization. engineer !!) 
This is where ASIC wins the race ! : ; 

Expensive Tools: ASIC design tools are very 
much expensive. You spend a huge amount of 


IB 


Disadvantages 
Higher unit cost 
Slow... difficult to achfEve hjgh frequency 


You have to use the resources available in the 
FPGA. Thus FPGA limits the design size. 


Introduction To ASIC-SoC Design By http://asic-soc.blogspot.com 22 


FPGA Vs ASIC Design Flow 


prs ex uem 
simulatian : 


See 


EAE 


Create ae Simulation 


A 
test vectors 2 2-2. 
22022 


ASIC 
vendor 


2222222 


ASIC vendor 
Place & Foute 


statir 1. 


Signi-off 
2. 
Fab prototype © EEO 


5 
:] 


3-6 Weeks ; 
Бс 


ASIC 
Production 
8-16 weeks 
lead-time 


VHDLverilog . | 

"s simulation — 

22. —ná 
-o 5 

И x 

Simulation = Greate | 

ЭУ 5222. > E | 


User Mapping, Optional - 
Place & route, 
‚ | FPGA 
designer 


Program Prototype 
Instant 


FPGA 
Volume 
Production 
“Off the 


eECO- 
Engineei 
rngchan 
ge order 


е үст. 


ASIC Design Flow Using Cadence Tool 
Start 


V ЕЕ” Incisive Tool 
sing gedit Platform 
Functional Verification (NCLAUNC 

Using simvision H) 
Synthesis 
Encounter 
Timing Simulation DN 
Complier 


DFT (Design For 
Testability) 


Floorplanning 


Power Planning 


Placement 


Clock Tree 


J DRC/LVS 


Encounter RTL 
to GDSII 


GDS-- Graphical 
Data Stream 
Information 
Interchange 


UNIT | & Il 
ASIC Construction & 
oystem Partitioning 


Dr.K.Kalyani 
AP, ECE, 
TCE 


Physical Design Steps 


Пе sgn entry 


y" 


МНОМ 


Part of ап ASIC design flow showing 
the system partitioning, floorplanning, 
placement, and routing steps. 
Performed in a slightly different order, 
iterated or omitted depending on the 
type and size of the system and its 
ASICs. 

Floorplanning assumes an increasingly 
important role. 

Sequential-Each of the steps shown in 
the figure must be performed and each 
depends on the previous step. 

Parallel- However, the trend is 
toward completing these steps in a 
parallel fashion and  iterating, 
rather than in a sequential manner. 
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CAD Tools 
System partitioning: 


* Goal. Partition a system into a number of ASICs. 

e Objectives. 
— Minimize the number of external connections between the ASICs. 
— Keep each ASIC smaller than a maximum size. 


Floor planning: 
* Goal. Calculate the sizes of all the blocks and assign them locations. 
e Objective. Keep the highly connected blocks physically close to each other. 


Placement: 


e Goal. Assign the interconnect areas and the location of all the logic cells within 
the flexible blocks. 


e Objectives. Minimize the ASIC area and the interconnect density. 


Global routing: 
e Goal. Determine the location of all the interconnect. 
e Objective. Minimize the total interconnect area used. 


Detailed routing: 
e Goal. Completely route all the interconnect on the chip. 
e Objective. Minimize the total interconnect length used. 


Methods and Algorithms 


Each of the ASIC physical design steps, in general, belongs to a class 
of mathematical problems known as NP-complete problems. 

Definition : This means that it is unlikely we can find an algorithm to solve 
the problem exactly in polynomial time. 

Polynomial: If the time it takes to solve a problem increases with the size of 
the problem at a rate that is polynomial but faster than quadratic (or worse 
in an exponential fashion). 

A CAD tool needs methods or algorithms to generate a solution to each 


problem using a reasonable amount of computer time. 
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Measurement or objective function 


Measurement or objective function: 


We need to make a quantitative measurement of the quality of the solution 
that we are able to find. 

Often we combine several parameters or metrics that measure our 
goals and objectives into a measurement function or Objective 
function. 


Cost Function : 


If we are minimizing the measurement function, it is a cost function. 

Gain function : 
If we are maximizing the measurement function, we call the function a gain 
function (sometimes just gain). 
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ASIC Physical steps 


e Each step of ASIC physical design steps are solved by: 


— A set of goals and objectives 
— A way to measure the goals and objectives 


— Algorithm or method to find a solution that meets the goals and 


objectives. 
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VLSI Physical Design :- Partitioning 
System partitioning requires 
— Goals and Objectives 


— Methods and algorithms to find solutions 
— Ways to evaluate these solutions. 


Goal of partitioning 
— Divide the system into number of small systems. 


Objectives of Partitioning 


we may need to take into account any or all of the following objectives: 


— Amaximum size for each ASIC 

— Amaximum number of ASICs 

— Amaximum number of connections for each ASIC 

— Amaximum number of total connections between all ASICs 
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Measuring Connectivity 


edge 
sanal To VOLI "A -— module, 


or w wa 
tem a а И 2 ріп 
х 
network graph orp іе 
(а) (b) 


Measuring Connectivity 


Figure (a) shows a circuit schematic, netlist, or network. 
The network consists of circuit modules A-F. Equivalent terms for а 
circuit module - cell, logic cell, macro, or a block. 
A cell or logic cell -a small logic gate (NAND etc.),collection of other 
cells; 
Macro - gate-array cells; 
Block - a collection of gates or cells. 
Each logic cell has Electrical connections between the terminals- 
connectors or pins. 
The network can be represented as the mathematical graph shown in 
Figure (b). 
A graph is like a spider's web: 

— it contains vertexes (or vertices) А-Ғ -graph nodes or points) that are 

connected by edges. 
- А graph vertex corresponds to a logic cell. 


— Anelectrical connection (a net or a signal) between two logic cells 
corresponds to a graph edge. 
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Measuring Connectivity 


net cut 


logic net cutset = two nets 
module 


C Asingle 

4” wire Б 
modeled by 
multiple 
edges in 
the network 
graph. 


edge cutset= four edges 
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Measuring Connectivity 


* Net Cutset 


— Divide the network into two by drawing a line across connections, make 


net cuts. The resulting set of net cuts is the net cutset. 
— Number of net cuts - the number of external connections between the 
two partitions in a network. 
* Edge cutset. 


— When we divide the network graph into the same partitions we make 
edge cuts and we create the edge cutset. 

— Number of edge cuts — the number of external connections between the 
two partitions in a graph 

— Number of edge cuts in a graph is not necessarily equal to the 


number of net cuts in the network. «72 


Estimating ASIC size 


* Estimate the die size of a 40 k-gate ASIC in a 0.35 um gate array, 
three-level metal process with 166 I/O pads. 


• Die size includes core size апа [I/O size. 


* Core size(logic and routing)=(gates/gate density) xrouting factor 


x(1/gate array utilization) 
— Gate density=standard cell density xgate array utilization 


* О size = a? where ais the one side of die. 
— One side of die= No of I/O pads in a side х I/O pad pitch 


1um(micron)20.0393701 mil 


(1mil= Thousands of inch) 
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Estimating ASIC size 


Some useful numbers for ASIC estimates, normalized to a 1 ит technology 


Parameter Typical value 


Comment Scaling 


0.5 um-0.5 (minimum In a 1umtechnology, ).- 0.5 um. 


feature size) 


Effective gate length 0.25 to 1.0um 
l/O-pad width (pitch) 5 to 10mil 
=125 to 250um 


1510 20mil 
=375 to 500um 


l/O-pad height 


Large die 1000 mil/side, 109mil? Approximately constant 1 
Small die 100 mil/side, 10*mil@ Approximately constant 1 


Standard-cell density 1 5.. 19-3 gate/um? 
-1.0gate/mil? 

Standard-cell density 8x10- gate/um? 
= 5 0gate/mil? 


Gate-array utilization 60 to 8096 


Less than drawn gate length, usually 
by about 10 percent. 


Fora 1 um technology, 2LM 

().-0.5 um). Scales less than linearly 
with 2.. 

For a 1um technology, 2LM 
(.-0.5um). Scales approximately lin- 
early with i. 


For Tum, 2LM, library 

12 
= 410^ gaten? (independent of bi 
scaling). 
For 0.5 ит, 3LM, library 

2 

-5х10-4 gate/. (independent of МА 
scaling). 
For 2LM, approximately constant 
For 3LM, approximately constant 


(0.8 to 0.9) x standard For the same process as standard 


cell density 


cells 


Standard-cell rout- 1.5 to 2.5 (2LM) 
1.0 to 2.0 (3LM) 


pin” 


Approximately constant 


$0.01/pin, “penny per Varies widely, figure is for low-cost 


plastic package, approximately con- 1 


Estimating ASIC size 
For this ASIC the minimum feature size is 0.35 um. 


Gate Density: 
Gate density=standard cell density xgate array utilization 


gate density = 0.35 u m standard-cell density се (0.8 to 0.9) 


4 co 10-4 to 4.5 œ 10 -4 gate/ A2 . 
Core Size 
Core size(logic and routing)=(gates/gate density) xrouting factor 
x(1/gate array utilization) 
(4 с 10 4 gates/gate density) «e routing factor се (1/gate-array utilization) 
4 œ 104 /(4 œ 10 4 to 4.5 œ 10 74 ) © (1 to 2) се 1/(0.8 to 0.9) = 108 
to 2.5 œ 108 2 


= 4840 to 11,900 mil? . 
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Estimating ASIC size 


Die size 
No of I/O pads=166. 
No of pads per side 
=166/4=42 1/0 pads per side. 
If a I/O pad pitch=5 mil then 
One side of die 
=5x42=210mil 
Minimum requirement of die size to fit 166 I/O pads 
-210x21024.4x10^mil? 
Die area utilized by core logic 
=1.19x104/4.4x104mil?=27% . 
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Partitioning of a Circuit 


- э spe 7 
лы” »- 
(a) 


A Simple Partitioning Example 


ASIC 1 ASIC 2 ASIC 3 


*FIGURE 15.7 Partitioning example. 
7981.19 partinon our simple network into ASICS. per ASIC 
*Obpectirviesvaréthete Howeims (nets 2, 4, 5, 6, and 8)—the minimum number. 


i -Use no more than three ASICs. 
° -Each ASIC is to contain no more than four logic cells. 
ы -Use the minimum number of external connections for ,,. 


each ASIC. 


der 


“Гурев of Partitioning 


e Splitting a network into several pieces - network partitioning 
problem. 


* Two types of algorithms used in system partitioning are 


— Constructive partitioning - uses a set of rules to find a 


solution. 


— |terative partitioning improvement (or iterative partitioning 
refinement - takes an existing solution and tries to 


improve it. 
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Constructive Partitioning 


e The most common constructive partitioning algorithms - seed growth or cluster 


growth. 


e The steps of a simple seed-growth algorithm for constructive partitioning: 


1. 


2. 


Start a new partition with a seed logic cell. 


Consider all the logic cells that are not yet in a partition. Select each of 
these logic cells in turn. 


Calculate a gain function, g(m) , that measures the benefit of adding 
logic cell m to the current partition. One measure of gain is the number 
of connections between logic cell m and the current partition. 


Add the logic cell with the highest gain g(m) to the current partition. 


Repeat the process from step 2. If you reach the limit of logic cells in a 
partition, start again at step I. 


Constructive Partitioning 
е Seed Logic cell: 


The logic cell with the most nets is a good choice as the 
seed logic cell. 
е Cluster: 
A set of seed logic cells known as a cluster. 
Called as clique —borrowed from graph theory. 
e Clique: 
A clique of a graph is a subset of nodes where each pair of 
nodes is connected by an edge 


Constructive Partitioning 


“А constructed partition using logic cell C as a seed. It is 
difficult to get from this local minimum, with seven external 
connections (2, 3, 5, 7, 9,11,12), to the optimum solution of b. 


*83 


Improvement in Partitioning 


ASIC 1 ASIC 2 ASIC 3 


Fig 1 with 5 external connections Fig 2 with 7 external connections 


“То get from the solution shown in Fig 2 to the solution of Fig 1, 
which has a minimum number of external connections, requires a 


complicated swap. 


“Тһе three pairs: D and F, J and К, C and L need to be swapped- 


Iterative Partitioning Improvement 


Algorithm based on Interchange method and group migration method 


Interchange method (swapping a single logic cell): 
If the swap improves the partition, accept the trail interchange otherwise select a 
new set of logic cells to swap. 
Example: Greedy Algorithm — 
It considers only one change 
-Rejects it immediately if it is not an improvement. 
-Accept the move only if it provides immediate benefit. 


It is known as local minimum. 


Group Migration (swapping a group of logic cell): 
* Group migration consists of swapping groups of logic cells between partitions. 


e The group migration algorithms — 
— Adv: better than simple interchange methods at improving a solution 
— Disadv: but are more complex. 


Example: Kernighan — Lin Algorithm (K-L) 
- Min cut Problem : Dividing a graph into two pieces, minimizing ће „ә; 


nets(edges) that are cut 


The Kernighan—Lin Algorithm 


connectivity 
matrix 


The Kernighan- Lin Algorithm (contd.,) 


Total external cost, cut cost, cut weight 


W = bun 


аєА,БєВ 


E, = 2 6 


yeB 


L E 


ZEA 


External edge cost 
Internal edge cost 
in g - D, +р, -2C,, 


where 
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The Kernighan—Lin Algorithm (contd.,) 


The K-L algorithm finds a group of node pairs to swap that increases the gain 
even though swapping individual node pairs from that group might decrease the 


gain. 
The steps of K-L algorithm are: 


. Find two nodes, a; from A, and b; from B, so that the gain from swapping them is a 
maximum. Тһе gain g; is 8; =D, + D, —2C,, 
. Next pretend swap ai and bi even if the gain g; is zero or negative, and do not 


consider ai and bi eligible for being swapped again. 


. Repeat steps 1 and 2 a total of m times until all the nodes of A and B have been 


pretend swapped. We are back where we started, but we have ordered pairs of 


nodes in A and B according to the gain from interchanging those pairs. 
*88 


The Kernighan-Lin Algorithm (contd.,) 


Now we can choose which nodes we shall actually swap. Suppose we only swap the 
first n pairs of nodes that we found in the preceding process. In other words we 
swap nodes X = al, a2, &...., an from A with nodes Y = bl, b2,&.....,bn from B. 
The total gain would be, п 
G, = 2 8: 
i=] 


. We now choose n corresponding to the maximum value of G, 
If the maximum value of С, > 0, then swap the sets of nodes X and Y and thus 


reduce the cut weight by G, . 
Use this new partitioning to start the process again at the first step. 


If the maximum value of С, = 0, then we cannot improve the current partitioning 


and we stop. 
We have found a locally optimum solution. 
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cordguration-l 


swap nodes and b 
В | 
b 


? 


B A B 
edges cutz 4 


(a) 
Gain fom swapping ith pair af nades, 2 


after swapping nodes | and b, 
gain, gz 47222 


+ 


5 i, number of pairs of 
nodes pretend swapped 
- 
ibi 


Total gain fom swapping the frst т pairs of nodes, Gy, 


G -g*«g 


K 


пан (6) 


5 nnum ber af pairs of 
nodes actually swapped 


*FIGURE 15.9 Partitioning a 
graph using the Kernighan—Lin 
algorithm. 


*(a) Shows how swapping node 1 
of partition A with node 6 of 


partition B results in a gain of g = 
2. 


*(b) A graph of the gain resulting 


from swapping pairs of nodes. 


*(c) The total gain is equal to the 
sum of the gains obtained at each 
step. 
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Kernighan-Lin Algorithm (1) 


eGiven: 
Initial weighted graph G with 
V(G) ={ a, b, C, d, e, f j 


eStart with any partition of 
V(G) into X and Y, say 


“X={a,c,e} 
Y={b,d,f} 


KL algorithm (2a) 


*Compute the gain values of moving 
node x to the others set: 


П. = Е, - Г 

Е, = cost of edges connecting node х 

with the other group (extra) 

І, = cost of edges connecting node x 
-within its own group (intra) 


“П,-Е,-І,--3 (-3-4-2) 

“Х-4а,с,е) °-D,=E,-I,= 0 (=1+2+4-4- 3) 

шіл D, 2 E,-1,2 «1 (26-2-3) 
-D,=E,-1,=+2 (2341-2) 
D,3E,-I,2-1 (-2-2-1) 
П,-Е,-І,--9 (-4-6-1) 


KL algorithm (2b) 


'X-1a,c,e] 
Y={b,d,f} 


*Cost saving when exchanging a and № is 
essentially D, + D; 


«However, the cost saving 3 of the direct 
edge was counted twice. But this edge 
still connects the two groups 


*Hence, the real “gain” (i.e. cost saving) 
of this exchange is ¢,, = D, + D, - 2c,, 


D, - E,-1,2-3 (23-4- 2) 
:D,2E,-1,242 (2341-2) 
*g,, = Da + Dp- 2e, =- 7 (= -3%2-23) 


KL algorithm 


=D, +D,- PEET 

= D, + Dł- 2c „= 0-1-22 --5 
=D, +D;-2c =0+9-24 =+1 
= D, + D,- 2c, = +1 + 2-2-0 = +3 
= D, + Dı- 2с4=+1-1-– 2-0 = 0 
=D, +D,- 2с, = +1 +9 – 2-6 = -2 


KL algorithm (4) 


*cut-size = 16 — 6 = 10 
*[ hen lock up 
nodes a and f 


KL algorithm (5) 


*Update the G-values of unlocked nodes 


eD’, =D, 2с – 2с„= 0 + 2(4- 4) = 0 

eD’, =D, *2c,,-2,21422-6) = -7 
D' = Dy + 26,7 204,72 2(0 - 3) = -4 
= ри + Wy — 264,721 + 20. -0) 21 


KL algorithm (6) 


*Compute the gains 


*2£,,7D'*D';-2c6,,20-4-21 —6 
о? D'-*D'j-2c,2-7-4-2.0z- 


11 
D’, + D’,-2c,,=—-7 + 1 — 2.0 = -6 


KL algorithm (7) 


*[ hen lock up 
nodes c and d 


9 ;,7D';*D'5;-26,,20-1-22 =-3 


1 


*Compute the gains |* 


в” =D”, +D”, -2c =-1-2-20 =-3 


KL algorithm (9) 


« oummary of the Gains... 
—g=+6 
– 9+9 =+6-3 = +3 
-0%0%0-%6-3-3-0 

e Maximum Gain = g = +6 

e Exchange only nodes a and f. 

e End of 1 pass. 


О Repeat the Kernighan-Lin. 


Demerits of Kernighan- Lin Algorithm 


Minimizes the number of edges cut, not the number of nets cut. 
Does not directly allow for more than two partitions. 
Does not allow logic cells to be different sizes. 
Does not allow partitions to be unequal or find the optimum partition size. 
Does not allow for selected logic cells to be fixed in place. 
K-L Finding local optimum solution in random fashion 
— Random starting partition 
— Choice of nodes to swap may have equal gain 
Expensive in computation time. 
— An amount of computation time that grows as п? logn for 2n nodes. 


Solution: 


To implement a net-cut partitioning rather than an edge-cut partitioning, 
keep track of the nets rather than the edges — FM algorithm 
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*Hypergraph- То repre nets with т" terminals in а 
rer 

network accurately ypersrap 

“А hypergraph consist of — 


» astar- a special type of vertex 


»ahy 
Üne - — 
m to one hyperedge ina 
termin: hypergraph. 4- hyperedge 


(a М 


*FIGURE - A hypergraph. (a) The network contains a net y with three terminals. (b) Іп the 
network hypergraph we can model net y by a single hyperedge (B, C, D) and a star node. Now 


there is а direct correspondence between wires or nets in the network and hyperedges in the grapb 


Fiduccia-Mattheyses (F-M) Algorithm 


* Addresses the difference between nets and edges. 
* Reduce the computational time. 


Key Features of F-M: 


Base logic cell - Only one logic cell moves at a time. 
— Base logic cell is chosen to maintain balance between partitions in order to stop the 
algorithm from moving all the logic cells to one large partition 


— Balance - the ratio of total logic cell size in one partition to the total logic cell size in the 
other. Altering the balance allows us to vary the sizes of the partitions. 


Critical nets - used to simplify the gain calculations. 
— A net is a critical net if it has an attached logic cell that, when swapped, changes the 
number of nets cut. 


— Itis only necessary to recalculate the gains of logic cells on critical nets that are attached 
to the base logic cell. 


The logic cells that are free to move are stored in a doubly linked list. The lists are 
sorted according to gain. This allows the logic cells with maximum gain to be found 
quickly. 


{ i А А . 2103 
Reduce the computation time - increases only slightly more than linearly with the 
number of lnoic cells in the network 


Features of FM Algorithm 


е Modification of KL Algorithm: 
- Can handle non-uniform vertex weights (areas) 
- Allow unbalanced partitions 
- Extended to handle hypergraphs 
- Clever way to select vertices to move, run much faster. 


[deas of FM Algorithm 


e Similar to KL: 
- Work in passes. 
- Lock vertices after moved. 


- Actually, only move those vertices up to the maximum 
partial sum of gain. 


e Difference from KL: 
- Not exchanging pairs of vertices. 
Move only one vertex at each time. 
- The use of gain bucket data structure. 


Gain Bucket Data Structure 


FM Partitioning: 


Moves are made based on object gain 


Object Gain: The amount of change in cut crossings that will occur if an object is 
moved from its current partition into the other partition 


- each objectis assigned a gain 

- objects are put into a sorted 
gain list 

- the object with the highest 


gain from the larger of the two 
sides is selected and moved. 


- the moved object is "locked" 

- gains of "touched" objects are 
recomputed 

- gain lists are resorted 


FM Partitioning: 


FM Partitioning: 
Moves are made based on object gain. 


Object Gain: The amount of change in cut crossings 
that will occur if an object is moved from 
its current partition into the other partition 


- Each object is assigned a 
gain 

- Objects are put into a sorted 
gain list 

- The object with the highest gain 


from the larger of the two sides is 
selected and moved. 


- the moved object is "locked" 
- gains of "touched" objects are 
recomputed 


- gain lists are resorted 


Time Complexity of FM 


e For each pass, 
— Constant time to find the best vertex to move. 


- After each move, time to update gain buckets is 
proportional to degree of vertex moved. 


- Total time is O(p), where p is total number of pins 


e Number of passes is usually small. 


Overcome problems in K-L using F-M 
algorithm 


* To generate unequal partitioning 


— Dummy logic cells with no connections introduced in K-L algorithm 


— Adjust partition size according to balance parameter in F-M 


algorithm 
* To fix logic cells in place during partitioning 


— That logic cells should not be considered as base logic cells in F-M 


algorithm. 
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Ratio-Cut Algorithm 


Removes the restriction of constant partition sizes. 


The cut weight W for a cut that divides a network into two partitions, A 


and B , 15 given by , 
v= Ус, 
аєА,БєВ 


The ratio of a cut is defined as 
К = W/(|A |B|) 


The |A| and |B| are size of a partition is equal to the number of nodes it 
contains (also known as the set cardinality). The cut that minimizes R 15 
called the ratio cut. 
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Ratio-Cut Algorithm (contd.,) 


" A network is partitioned into small, highly connected groups 
using ratio cuts. 


= A reduced network is formed from these groups. 


—Each small group of logic cells forms a node in the reduced network. 


" Finally, apply the F-M algorithm to improve the reduced 
network 


Difference of Ratio-cut than K-L 


The K-L algorithm minimizes W while keeping partitions 


A and B the same size. 4127 


Look-ahead Algorithm 
Why Look-ahead? 


e K-L and F-M algorithms consider only the immediate gain to be made 
by moving a node. 


* When there is a tie between nodes with equal gain (as often happens), there is no 
mechanism to make the best choice. 


Algorithm 


* The gain for the initial move is called as the first-level gain. 

* Gains from subsequent moves are then second-level and higher gains. 
* Define a gain vector that contains these gains. 

* The choice of nodes to be swapped are found Using the gain vector . 


* This reduces both the mean and variation in the number of cuts in the 


resulting partitions. m 
1 


Look-ahead Algorithm 


gain = +1 
2 gain = +1 
6 E 1 S. 
3 842 2 
5 -— 5 10 
A в 
E (b) 
(а) 

* Gain vector: Move 2 to B=+1, Move 3 to B=+1 

re B H в а 

и 3 A Ша 3 

— 4 gain = +2 


* Gain vector: Move 5 to B=+1, Move 4 to B=+2 


(d) (e) (0 
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Look-ahead Algorithm (contd.,) 


*An example of network partitioning that shows the need to look ahead when 


selecting logic cells to be moved between partitions. 

*Partitionings (a), (b), and (c) show one sequence of moves — Partition I 
*Partitionings (d), (e), and (f) show a second sequence — Partition II 

* Partition I: 


*The partitioning in (a) can be improved by moving node 2 from A to B with a 
gain of 1. The result of this move 1s shown in (b). This partitioning can be 


improved by moving node 3 to B, again with a gain of 1. 
* Partition II: 


“Тһе partitioning shown in (d) is the same as (a). We can move node 5 to B 


with a gain of 1 as shown in (e), but now we can move node 4 to B with a gail 
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simulated Annealing 


Takes an existing solution and then makes successive changes in a series 
of random moves. 


Each move is accepted or rejected based on an energy function. 


In the Interchange method, 
* Accept the new trial configuration only if the energy function decreases, 


which means the new configuration is an improvement 


But in the simulated Annealing, 


* Accept the new configuration even if the energy function increases for 
the new configuration—which means things are getting worse. 


* The probability of accepting a worse configuration is controlled by the 
exponential expression exp(—AE / T ), 


where, ^ E - the resulting increase in the energy function. 


T - a variable that can be controlled and corresponds to the 
temperature in the annealing of a metal cooling (this is why the process is called 
simulated annealing). 
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Simulated Annealing 


A parameter that relates the temperatures, T; and T at the 1 th and i + 1 th 


і +15 
iteration: 

Т,ы=аТ,. 
As the temperature is slowly decreased, the probability of making moves that 
increase the energy function gets decreased. 
Cooling schedule — The critical parameter of the simulated-annealing algorithm is 


the rate at which the temperature T is reduced. 


Finally, as the temperature approaches zero, refuse to make any moves that 
increase the energy of the system and the system falls and comes to rest at the 
nearest local minimum. 


The minimums of the energy function correspond to possible solutions. 


The best solution is the global minimum. 
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Simulated Annealing 


Requirement of Simulated Annealing: 


To find a good solution, a local minimum close to the global minimum, requires a 


high initial temperature and a slow cooling schedule. 

Disadvantage: 

This results in many trial moves and very long computer run time.(it gives 
optimum) 

Advantage: 

To solve large graph problems 


Hill climbing- Accept moves that seemingly take us away from a desirable 
solution to allow the system to escape from a local minimum and find other, 
better, solutions. 
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Other Partitioning Objectives 


Constraints or Purpose Implemented 
Objectivies 


Timing Constraints certain logic cells in a system Adding weights to nets to make 
may need to be located on them more important than others. 
the same ASIC in order to avoid 
adding the delay of any external 
interconnections 


Power Constraints Some logic cells may consume To assign more than rough estimates 
more power than others of power consumption for each logic 
cell at the system planning stage, 
before any simulation has been 
completed. 


Technology Constraints To include memory оп an ASIC | It will keep logic cell together 
requiring similar technology 
To use low cost package To keep ASICs below a certain size. 


Test Constraints Maintain Observability and It require that we force certain 
Controllability connection to external 


FPGA Partitioning 
- An asynchronous transfer mode (ATM) 


connection simulator 


ATM is a signaling protocol for many different types of traffic including 
constant bit rates (voice signals) as well as variable bit rates (compressed 
video). 

The ATM Connection Simulator is a card that is connected to a 


computer. 


Under computer control the card monitors and corrupts the ATM signals to 


simulate the effects of real networks. 


An example would be to test different video compression algorithms. 
Compressed video is very bursty (brief periods of very high activity), 


has very strict delay constraints, and is susceptible to errors. Е 
2 


e Asynchronous transfer mode (ATM) cell format 


bit num ber 


БЕС = generic flow control 
VEL = virtual path identiter 
МС = virtual channel identifier 
FTI = payload type identifier 
CLF = cell loss priority 

HE Cz header error control 


payload 


=з payload 


*FIGURE 15.4 The asynchronous transfer mode (ATM) cell format. The ATM protocol uses 
53-byte cells or packets of information with a data payload and header information for 
routing and error control. 
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*An asynchronous transfer mode (ATM) connection 
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FPGA Partitioning 


The simulator is partitioned into the three major blocks 
* ATM traffic policer - which regulates the input to the simulator 


* ATM cell delays generator — which delays ATM cell, reorders 
ATM cells and inserts ATM cells with valid ATM cell headers. 


* ATM cell error generator — which produce bit errors and four 
random variables that are needed by the other two blocks. 


The Traffic Policer performs the following operations: 

* Performs header screening and remapping. 

* Checks ATM cell Conformance. 

* Delete selected ATM Cells. 
The delay generator delays, misinserts and reorders the target ATM cells. 
The error generator performs the following operations: 


* Payload bit error ratio generation: The user specifies the Bernoulli 
probability P544 of the payload bit error ratio. 


e Random variable generation for ATM cell loss, misinsertion, 
reordering and deletion. 
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Automatic Partitioning with FPGAs 


Altera hardware design language (AHDL) 
- To direct the partitioner to automatically partition logic into chips 
within the same family, using AUTO keyword, 
DEVICE top level IS AUTO ; 96 the partitioner assign logic 


- CLIQUE keyword to keep logic together 
CLIQUE fast. logic 
BEGIN 
| shift register: MACRO;% keep this in one device 
END; 
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Power Dissipation 


* Dynamic Power Dissipation 


— Switching current from charging and discharging parasitic 
capacitance. 


— Short-circuit current when both n -channel and p -channel 
transistors are momentarily on at the same time. 


e Static Power Dissipation 
— Subthresold current 
— Leakage current 
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Switching current 


From charging and discharging of parasitic capacitance 
When the p -channel transistor in an inverter is charging a capacitance, C, 
at a frequency, f, 
- the current through the transistor is I-C (d V /d ё). 
- The power dissipation is P=VI=CV (d V /d t ) for one-half the period of the input, t = 
1/(2 f). 
- The power dissipated in the p -channel transistor is thus 


Vpp 1 


2f 
J Pdt = J CVdV = 5 CV» 


When the n -channel transistor discharges the capacitor, the power 
dissipation is equal.( ie., i CV, ) 


Then total power dissipation, А = СУ» 


Most of the power dissipation in а CMOS ASIC arises from this source—the 
switching current. 


The best way to reduce power is to reduce V рапа to reduce C, the amount 
of capacitance we have to switch. 


Short circuit current 


Both n-channel and p-channel transistors momentarily on at the same 
time 


The short-circuit current or crowbar current can be particularly important 
for output drivers and large clock buffers. 


For a CMOS inverter, the power dissipation due to the crowbar current is 


Pis 
12 


Ips 


P, 


3 
(Vap = 2у,) , Transistor gain factorP = |2110 
Wo v Ұл ) E 2 Vos IZ 
- Where £=(W/L)uc,, 15 the same for both p - and n -channel transistors. 
- The threshold voltages V,,, are assumed equal for both transistor types. 


- tis the rise and fall time (assumed equal) of the input signal [ Veendrick, 
1984]. 


143 


Problem on Power Dissipation 


consider an output buffer that is capable of sinking 12mA at an output voltage 


of 0.5 V., Derive the transistor gain factor (Assume V;,zV5523.3V; V4 =0.65V) 


If the output buffer is switching at 100 MHz and the input rise time to the 


buffer is 2ns, Calculate the power dissipation due to short-circuit current. 
If the output load is 10 pF, Calculate the dissipation due to switching current. 


What do you infer from this? 


Inference: 
(В-0.01АУ-, short-circuit current, P,=0.00133W ог 1mW, switching 
current, P,20.01089W or 10mW) 


- short-circuit current is typically less than 10 percent of the switching current, , 


Subthreshold current 


CMOS transistor is never completely off 


When the gate-to-source voltage, V,,; , of an MOS transistor is less than the 
threshold voltage, V, , the transistor conducts a very small subthreshold current in the 
subthreshold region 


qVos 
ГРЕЕТ ехр м = | 
di 9 AKT 


- wherel,isa constant, and the constant, n, is normally between 1 and 2. 


The slope, S, of the transistor current in the subthreshold region is 
nkT 
S = — In10 
q 


Find the slope of transistor current S at a junction temperature, T - 125 ?C (400 K) 
and assuming n = 1.5 ( assume q = 1.6 * 10? Fm 1, k = 1.38 * 10 -23JK 71). What do 
you infer from the result of slope of transistor current. 

Inference: 

S = 120 mV/decade which does not scale. 


The constant value of S = 120 mV/decade means it takes 120 mV to reduce the 
subthreshold current by a factor of 10 in any process. 
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Leakage Current 


Transistor leakage is caused by the fact that a reverse-biased diode conducts 


a very small leakage current. 


The sources and drains of every transistor, as well as the junctions 


between the wells and substrate, form parasitic diodes. 


The parasitic-diode leakage currents are strongly dependent on the 
- type апа quality of the process 


- temperature. 


The parasitic diodes have two components in parallel: an area diode and a 


perimeter diode. 
The leakage current due to perimeter diode is larger than area diode. 


The ideal parasitic diode currents are given by the following equation: 
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UNIT I 


FLOORPLANNING 
Dr.K.Kalyani 
AP, ECE, 
TCE. 
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Introduction 


e The input to the floorplanning step - output of system partitioning and 
design entry—a netlist. 


e Netlist - describing circuit blocks, the logic cells within the blocks, and 
their connections. 


delay /ns eFIGURE 16.3 Interconnect and 


T gate delays. As feature sizes 


decrease, both average 
interconnect delay and average 
inteiconnart gate delay  decrease— but ай 


delay different rates. This is because 


interconnect capacitance tends to a 
limit that is independent of scaling. 
Interconnect delay now 


gate delay 


10 — 05 025 minimum ае 
Se dominates gate delay. 


eFloor planning - To predict interconnect delay by 
estimating interconnect length. 
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eThe.starting point of floorplaning and placement steps for 
the Viterbi decoder p 5 р р 


"collection of standard cells with no room set aside yet.£or 
routing. 


The starting point of floorplaning and 
placement steps for the viterbi decoder 


Small boxes that look like bricks - outlines of the standard cells. 


Largest standard cells, at the bottom of the display (labeled dfctnb) 
- 188 D flipflops. 


* symbols -drawing origins of the standard cells—for the D flip-flops 
they are shifted to the left and below the logic cell bottom left-hand 


corner. 
Large box surrounding all the logic cells - estimated chip size. 


(This is a screen shot from Cadence Cell Ensemble.) 
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The viterbi decoder after floorplanning and placement 
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The viterbi decoder after floorplanning 
and placement 


е 8 rows of standard cells separated by 17 horizontal 


channels (labeled 2-18). 
e Channels are routed as numbered. 


e In this example, the I/O pads are omitted to show the cell 


placement more clearly. 
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Floorplanning Goals and Objectives 


e The input to a floorplanning tool is a hierarchical netlist that describes 
- theinterconnection of the blocks (RAM, ROM, ALU, cache controller, and so on) 
- the logic cells (NAND, NOR, D flip-flop, and so on) within the blocks 
- thelogic cell connectors (terminals , pins , or ports) 


• The netlist is a logical description of the ASIC; 
e The floorplan is a physical description of an ASIC. 


е Floorplanning is a mapping between the logical description (the 
netlist) and the physical description (the floorplan). 


The Goals of Floorplanning are to: 

e Arrange the blocks on a chip, 

e Decide the location of the I/O pads, 

e Decide the location and number of the power pads, 
e Decide the type of power distribution, and 

e Decide the location and type of clock distribution. 


Objectives of Floorplanning - 

To minimize the chip area 

To minimize delay. ae 
Measuring area is straightforward, but measuring delay is more difficult 


Measurement of Delay in Floor planning 


of net р (b) 
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*FIGURE 16.4 Predicted capacitance. (a) Interconnect lengths as a function of fanout (FO) and circuit-block 
size. (b) Wire-load table. There is only one capacitance value for each fanout (typically the average value). 
(c) The wire-load table predicts the capacitance and delay of a net (with a considerable error). Net A and net 
B both have a fanout of 1, both have the same predicted net delay, but net B in fact has a much greater delay 
than net A in the actual layout (of course we shall not know what the actual layout is until much later in the 
design process). ° 


Measurement of Delay in Floor planning 
(contd.,) 

A floorplanning tool can use predicted-capacitance tables (also 

known as interconnect-load tables or wire-load tables ). 

Typically between 60 and 70 percent of nets have a FO - 1. 

The distribution for a FO - 1 has a very long tail, stretching to 

interconnects that run from corner to corner of the chip. 

The distribution for a FO - 1 often has two peaks, corresponding to 

a distribution for close neighbors in subgroups within a block, 

superimposed on a distribution corresponding to routing between 


subgroups. 
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Measurement of Delay in Floor planning 
(contd.,) 


We often see a twin-peaked distribution at the chip level also, corresponding 
to separate distributions for interblock routing (inside blocks) and 


intrablock routing (between blocks). 


The distributions for FO » 1 are more symmetrical and flatter than for FO - 


i» 


The wire-load tables can only contain one number, for example the 


average net capacitance, for any one distribution. 


Many tools take a worst-case approach and use the 80- or 90-percentile point 
instead of the average. Thus a tool may use a predicted capacitance for 
which we know 90 percent of the nets will have less than the estimated 


capacitance. 
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Measurement of Delay in Floor planning 
(contd.,) 


е Repeat the statistical analysis for blocks with different sizes. 
For example, a net with a FO - 1 in a 25 k-gate block will have a different 
(larger) average length than if the net were in a 5 k-gate block. 


е The statistics depend on the shape (aspect ratio) of the block 


(usually the statistics are only calculated for square blocks). 
е The statistics will also depend on the type of netlist. 


For example, the distributions will be different for a netlist generated by 
setting a constraint for minimum logic delay during synthesis—which tends 
to generate large numbers of two-input NAND gates—than for netlists 


generated using minimum-area constraints. 
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Floorplanning Tools 
Floorplanning in CBIC: 


Flexible blocks (or variable blocks ) : 


- Their total area is fixed, 
- Their shape (aspect ratio) and connector locations may be adjusted during the placement. 


Fixed blocks: 


- The dimensions and connector locations of the other fixed blocks (perhaps RAM, ROM, compiled 
cells, or megacells) can only be modified when they are created. 


Seeding: 
-  Forcelogic cells to be in selected flexible blocks by seeding . We choose seed cells by name. 
- Seeding may be hard or soft. 
Hard seed - fixed and not allowed to move during the remaining floor 
planning and placement steps. 


Soft seed - an initial suggestion only and can be altered if necessary by the 
floor planner. 


Seed connectors within flexible blocks—forcing certain nets to appear in a 
specified order, or location at the boundary of a flexible block. 


Rat's nest:-display the connection between the blocks 


Connections are shown as bundles between the centers of blocks or as fligltt>® 
lines between connectors. 


Floorplanning Tools 


fe sible standard-cell blocks fle sible sztandard-zll blocks 
inot yet placed’ (with estim ated placement) 


cam 
boundary 


nets in 
bundle 


Ж . : 
Л temi inal, pin, or 
fled Blacks (а) port че е, (bl 


mirror about 
х-ані= 


moe down 


zuwap -- 
ic} id} 
Floorplanning a cell-based ASIC. 


(a) Initial floorplan generated by the floorplanning tool. Two of the blocks are flexible (A and C) 
and contain rows of standard cells (unplaced). A pop-up window shows the status of block A. 


(b) An estimated placement for flexible blocks A and C. The connector positions are known and a 
rat's nest display shows the heavy congestion below block B. 


(c) Moving blocks to improve the floorplan. 
(d) The updated display shows the reduced congestion after the changes. 
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eAspect ratio and Congestion 
Analysis  .. 


ic) id) 


(a) The initial floorplan with a 2:1.5 die aspect ratio. 

(b) Altering the floorplan to give a 1:1 chip aspect ratio. 

Congestion analysis-One measure of congestion is the difference between the number of 
interconnects that we actually need, called the channel density , and the channel capacity 

(c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we know the terminal 
positions in the channels. Shading indicates the ratio of channel density to the channel capacity. Dark areas 
show regions that cannot be routed because the channel congestion exceeds the estimated capacity. 160 
(d) Resizing flexible blocks А and С alleviates congestion. 


Channel Definition 


channel B channel B 
block 1 block 3 
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. block , o == 
m2 > pin 3 т — 
. “ " “a! No w w “ ысы — =_e 
Adjust i - : cannot = ГЕ 
channel A ] Г" z adjust «ect - 
frst. fa “Теріп * channel A. -- 7” 
mi 3 4 = с 
1 block 2 

block 2 o Adjust channel B 

+ [27 Now we can frst. 
adjust channel B. (b) 


(a) 


eChannel definition or channel allocation 


eDuring the floorplanning step, assign the areas between blocks that аге to be 
used for interconnect. 


eRouting a T-junction between two channels in two-level metal. 
eThe dots represent logic cell pins. 


e(a) Routing channel A (the stem of the T) first allows us to adjust the width of channel 
B. (b) If we route channel B first (the top of the T), this fixes the width of channel A. 


А ы 161 
eRoute the stem of a T-junction before route the top. j 


Channel Routing 


route 
channels 
n 
cut order 
Ww *"7 
А 
routing C 
channel 
B RP 
E 
circuit 
block ber 


(a) 


eDefining the channel routing order for a slicing floorplan using a slicing tree. 

e(a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each 
piece contains just one circuit block. Each cut divides a piece into two without cutting 
through a circuit block. 

e(b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks 
are left. 

e(c) The slicing tree corresponding to the sequence of cuts gives the order in which to route 


the channels: 4, 3, 2, and finally 1. T 
°] 


Cyclic Constraints 


оооооооро 
с 


ара. 
о E о 
a 
ág 


a 


D 


D 
о 
о 
о 
о 
1 
о 
о 
о 
оооооооро D 


о о 
о о 
о о 
о о 
о о 
D в D 
о о 
о о 
о о 
о о 


(b) 


«Cyclic constraints. 

e(a) A nonslicing floorplan with a cyclic constraint that prevents channel routing. 
(b) In this case it is difficult to find a slicing floorplan without increasing the chip 
area. 

e(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic 
constraints, but it is inefficient in area use and will be very difficult to route. 
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Cyclic Constraints 


(а) 


сүсіс constraint: 
aa a menge 

| 45.  —] | standard 

cell areas 


A and C 


channel 
num ber 
(in routing 
order) 


e(a) We can eliminate the cyclic constraint by merging the blocks A and C. 
e(b) A slicing structure. 
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I/O and Power Planning (contd.,) 


Every chip communicates with the outside world. Signals 
flow onto and off the chip and we need to supply power. 


The I/O and power constraints has to considered early in 
the floorplanning process. 


A silicon chip or die (plural die, dies, or dice) is mounted 
on a chip carrier inside a chip package . 


Connections are made by bonding the chip pads to fingers 
on a metal lead frame that is part of the package. 


The metal lead-frame fingers connect to the package pins 
. A die consists of a logic core inside a pad ring. 


Die Attach Epoxy Mold 


Compound 


I/O and Power Planning 
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«FIGURE 1612 Pad-limited and core-limited die. 

a НЕР ы "n 
e(a) A p{id-limited die. Tall, thin pad-limited pads, which 
maximize the number of pads we can fit around the outside 
of the chip. The number of pads determines the die size. 


e(b) A core-limited die: short, wide core-limited pads.The 
core logic determines the die size. 


е (c) Using both pad-limited pads and core-limited pads 
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I/O and Power Planning (contd.,) 
Power Pads and I/O Pads: 


Special power pads are used for:1. positive supply, or VDD, power buses 
(or power rails ) and 


2. ground or negative supply, VSS or GND. 
One set of VDD/VSS pads supplies power to the I/O pads only. 


Another set of VDD/VSS pads connects to a second power ring that supplies the 
logic core. 


The I/O power is a dirty power since it has to supply large transient currents to 
the output transistors. We keep dirty power separate to avoid injecting noise 
into the internal-logic power (the clean power ). 


I/O pads also contain special circuits to protect against electrostatic discharge 
( ESD ). These circuits can withstand very short high-voltage (several kilovolt) 
pulses that can be generated during human or machine handling. 
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I/O and Power Planning (contd.,) 


Pad Seed: Fix the position of the chip pad for down 
bonding 


Making an electrical connection between the substrate 
and a chip pad, or to a package ріп, it must be to VDD (п 
-type substrate) or VSS ( p -type substrate). This substrate 


connection (for the whole chip) employs a down bond (or 
drop bond) to the carrier. We have several options: 


» Dedicate one (or more) chip pad(s) to down bond to 
the chip carrier. 


» Make a connection from a chip pad to the lead 
frame and down bond from the chip pad to the chip 
carrier. 

» Make a connection from a chip pad to the lead 
frame and down bond from the lead frame. 

» Down bond from the lead frame without using a 
chip pad. 


> Leave the substrate and/or chip carrier 
unconnected. 


Depending on the package design, the type and positioning 
of down bonds may be fixed. 


This means we need to fix the position of the chip pad for ee АШ е 
down bonding using а pad seed icio 
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I/O and Power Planning (contd.,) 


A double bond connects two pads to one chip-carrier finger and one package 
pin. We can do this to save package pins or reduce the series inductance of 
bond wires (typically a few nanohenries) by parallel connection of the pads. 


Multiple VDD and VSS pads: 


To reduce the series resistive and inductive impedance of power supply 
networks, it is normal to use multiple VDD and VSS pads. 


This is particularly important with the simultaneously switching outputs ( 
5505 ) that occur when driving buses . 


The output pads can easily consume most of the power on a CMOS ASIC, 
because the load on a pad (usually tens of picofarads) is much larger than 
typical on-chip capacitive loads. 


Depending on the technology it may be necessary to provide dedicated VDD 
and VSS pads for every few SSOs. Design rules set how many SSOs can be 
used per VDD/VSS pad pair. These dedicated VDD/VSS pads must "follow" 
groups of output pads as they are seeded or planned on the floorplan. 
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I/O and Power Planning (contd.,) 


Using a pad mapping we translate the logical pad in a netlist to a physical pad from a 
pad library . We might control pad seeding and mapping in the floorplanner. 


Handling of I/O pads can become quite complex; there are several nonobvious 
factors that must be considered when generating a pad ring: 


Design library pad cells for one orientation. 


- For example, an edge pad for the south side of the chip, and a corner pad 
for the southeast corner. 


We could then generate other orientations by rotation and flipping (mirroring). 
Some ASIC vendors will not allow rotation or mirroring of logic cells in the 
mask file. 


To avoid these problems we may need to have separate horizontal, vertical, 
left-handed, and right-handed pad cells in the library with appropriate logical 
to physical pad mappings. 


If we mix pad-limited and core-limited edge pads in the same pad ring, this 
complicates the design of corner pads. Usually the two types of edge pad cannot 
abut. In this case a corner pad also becomes a pad-format changer , or 
hybrid corner pad. 


In single-supply chips we have one VDD net and one VSS net, both global power 
nets . It is also possible to use mixed power supplies (for example, 3.3 V and 
5 V) or multiple power supplies ( digital VDD, analog VDD). -170 


I/O and Power Planning (contd.,) 
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«FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited 
pads. (b) A hybrid corner pad. (c) A chip with stagger-bonded pads. (d) An area- 
bump bonded chip (or flip-chip). The chip is turned upside down and solder bumps 


connect the pads to the lead frame 
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I/O and Power Planning (contd.,) 


Other I/O bond: 


stagger-bond arrangement using two rows of I/O pads. In this case the design 
rules for bond wires (the spacing and the angle at which the bond wires leave 
the pads) become very important. 


Area-bump bonding arrangement (also known as flip-chip, solder-bump) 
used, for example, with ball-grid array ( BGA ) packages. 


Even though the bonding pads are located in the center of the chip, the I/O 
circuits are still often located at the edges of the chip because of difficulties in 
power supply distribution and integrating I/O circuits together with logic in 
the center of the die. 


I/O pads in MGA and CBIC: 


In an MGA the pad spacing and I/O-cell spacing is fixed—each pad occupies a 
fixed pad slot (or pad site ). This means that the properties of the pad I/O are 
also fixed but, if we need to, we can parallel adjacent output cells to increase 

the drive. To increase flexibility further the I/O cells can use a separation, the 


I/O-cell pitch , that is smaller than the pad pitch. ғы 


I/O and Power Planning (contd.,) 
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eFIGURE 16.14 Gate-array I/O pads. (a) Cell-based 
ASICs may contain pad cells of different sizes and 
widths. (b) A corner of a gate-array base. (c) A 
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I/O and Power Planning (contd.,) 


The long direction of a rectangular channel is the channel spine. 


Some automatic routers may require that metal lines parallel to a channel 
spine use a preferred layer (either m1, m2, or m3). Alternatively we say that 
a particular metal layer runs in a preferred direction . 


I/O and Power Planning (contd.,) 
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eFIGURE 16.15 Power distribution. (a) Power distributed using m1 for VSS and m2 for VDD. This helps 
minimize the number of vias and layer crossings needed but causes problems in the routing channels. 
(b) In this floorplan m1 is run parallel to the longest side of all channels, the channel spine. This can 
make automatic routing easier but may increase the number of vias and layer crossings. (c) An 
expanded view of part of a channel (interconnect is shown as lines). If power runs on different layers 
along the spine of a channel, this forces signals to change layers. (d) A closeup of VDD and VSS buses as 


they cross. Changing layers requires a large number of via contacts to reduce resistance. 


Power distribution. 


(a) Power distributed using m1 for VSS and m2 for VDD. 
- This helps minimize the number of vias and layer crossings needed 
- butcauses problems in the routing channels. 


(b) In this floorplan m1 is run parallel to the longest side of all 
channels, the channel spine. 

- This can make automatic routing easier 

- butmay increase the number of vias and layer crossings. 


(c) An expanded view of part of a channel (interconnect is shown as 
lines). If power runs on different layers along the spine of a channel, 
this forces signals to change layers. 


(d) A closeup of VDD and VSS buses as they cross. Changing layers 
requires a large number of via contacts to reduce resistance. 


Clock Planning 


e clock spine routing scheme with all clock pins driven directly from the clock 
driver. MGAs and FPGAs often use this fish bone type of clock distribution 


scheme 
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eFIGURE 16.16 Clock distribution. 

e(a) A clock spine for a gate array. 

e(b) A clock spine for a cell-based ASIC 
(typical chips have thousands of clock 
nets). 

e(c) A clock spine is usually driven from 
one or more clock-driver cells. Delay in 
the driver cell is a function of the 
number of stages and the ratio of output 
to input capacitance for each stage 
(taper). 

e(d) Clock latency and clock skew. We 
would like to minimize both latency and 
skew. 


Clock Planning (cont.,) 


e FIGURE 16.17 A clock tree. (a) Minimum delay is achieved when the taper of 
successive stages is about 3. (b) Using a fanout of three at successive nodes. 
(c) A clock tree for the cell-based ASIC of Figure 16.16 b. We have to balance 
the clock arrival times at all of the leaf nodes to minimize clock skew. 
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Introduction 


After completion of floorplan, placement of the logic cells within the 


flexible blocks takes place. 
Placement is much more suited to automation than floorplanning. 


After completion of floorplanning and placement, we can predict both 


intrablock and interblock capacitances 
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Placement Terms and Definitions 


CBIC, MGA, and FPGA architectures all have rows of logic cells separated by 


the interconnect—these are row-based ASICs 


channel 
density 
а? 


feedthrough using feedthrough call 
logic cell [vertical capa city = 1) m1 
ibi 
А ОНЦ E 
| f 


height z 15 


overthe-ce routing in m2 
ic} 
e INTERCONNECT STRUCTURE. (а) The two-level metal CBIC floorplan 


e(b) A channel from the flexible block A. This channel has a channel height equal to the 
maximum channel density of 7 


•(с) A channel that uses OTC (over-the-cell) routing in m2. 
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е GATE-ARRAY INTERCONNECT. 
e(a) A small two-level metal gate array (about 4.6 k-gate). 
e(b) Routing in a block. 


•(с) Channel routing showing channel density and channel capacity. The channel height оп a gate array 
may only be increased in increments of a row. If the interconnect does not use up all of the channel, the» 
rest of the space is wasted. The interconnect in the channel runs in m1 in the horizontal direction with 
m2 in the vertical direction. 


Some commonly used terms: 
е Vertical interconnect uses feedthroughs to cross the logic cells. 


« FEEDTHROUGH or JUMPER - A vertical strip of metal that runs from the top to 
bottom of a cell (for double-entry cells ), but has no connections inside the cell. 


е FEEDTHROUGH CELL (or crosser cell ) A dedicated empty cell (with no logic) 
that can hold one or more vertical interconnects 


« UNCOMMITTED FEEDTHROUGH (also BUILT-IN FEEDTHROUGH , IMPLICIT 
FEEDTHROUGH , or JUMPER ) - An unused vertical track (or just track ) in a 
logic cell. 


e FEEDTHROUGH PIN or FEEDTHROUGH TERMINAL - is an input or output 
that has connections at both the top and bottom of the standard cell. 


е SPACER CELL (usually the same as a feedthrough cell) is used to fill space in 
rows so that the ends of all rows in a flexible block may be aligned to connect to 
power buses. 


« ELECTRICALLY EQUIVALENT CONNECTORS (OR EQUIPOTENTIAL 
CONNECTORS ) - Two connectors for the same physical net. 


e LOGICALLY EQUIVALENT CONNECTORS (ог FUNCTIONALLY EQUIVALENT 
CONNECTORS, EQUIVALENT CONNECTORS) Example: The two inputs of a 
two-input NAND gate may be logically equivalent connectors 183 


Interconnect Area for CBIC,MGA and FPGA 
HORIZONTAL INTERCONNECT 


e Channeled gate arrays and FPGAs, the horizontal interconnect areas—the channels, 
usually on m1—have a fixed capacity. 


е The channel capacity of CBICs and channelless MGAs can be expanded to hold as 
many interconnects as are needed. 


VERTICAL INTERCONNECT 
e Inthe vertical interconnect direction, usually m2, FPGAs still have fixed resources. 


• In contrast the placement tool can always add vertical feedthroughs to a channeled 
MGA, channelless MGA, or CBIC. 
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Placement Goals and Objectives 


Goal of a placement - 

To arrange all the logic cells within the flexible blocks on a chip. 
Ideally, the objectives of the placement step are to 

е Guarantee the router can complete the routing step 

e Minimize all the critical net delays 

« Make the chip as dense as possible 


Additional objectives: 
е Minimize power dissipation 
е Minimize cross talk between signals 
The most commonly used placement objectives (by current Placement 
tools): 
е Minimize the total estimated interconnect length 
е Minimize the interconnect congestion 
е Meet the timing requirements for critical nets 
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Measurement of Placement Goals and Objectives 
- Interconnect length 


Trees on graphs - The graph structures that correspond to making all the 
connections for a net. 


Steiner trees —Special classes of trees-minimize the total length of 
interconnect. 


Steiner tree (Rectilinear routing or Manhattan routing) - This type of tree 
uses diagonal connections—solving problem using interconnects on a 
rectangular grid. 


- Use Manhattan distance than Euclidean distance. 
Euclidean distance between two points is the straight-line distance. 
Manhattan distance - rectangular distance. 


Minimum rectilinear Steiner tree ( MRST ) - shortest interconnect using a 
rectangular grid. The determination of the MRST is in general an NP- 
complete problem—which means it is hard to solve. 
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•РІасетепї using trees on graphs. (a) The floorplan 


e(b) An expanded view of the flexible block A showing four rows of standard cells for placement (typical 
blocks may contain thousands or tens of thousands of logic cells). We want to find the length of the net 
shown with four terminals, W through Z, given the placement of four logic cells (labeled: A.211, A.19, 
A.43, A.25). (с) The problem for net (W, X, Y, Z) drawn as a graph. The shortest connection is the 
minimum Steiner tree. (d) The minimum rectilinear Steiner tree using Manhattan routing. The °187 
rectangular (Manhattan) interconnect-length measures are shown for each tree 


Measurement of Placement (contd.,) 
Interconnect Length 


Complete graph has connections from each terminal to every other terminal. 


Complete-graph measure adds all the interconnect lengths of the complete- 
graph connection together and then divides by n /2, wheren is the number of 
terminals. 

Complete graph = (n (n -1))/ 2) 


Bounding box - the smallest rectangle that encloses all the terminals. 
Half-perimeter measure (or bounding-box measure) - 
one-half е perimeter of the bounding box. 


Half perimeter f= % X". h; 
where m is the nets, h; is the half perimeter measure for net i. 
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Interconnect-length measures. 


(a) Complete-graph measure. (b) Half-perimeter measure. 
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Correlation between total length of chip interconnect and 
the half-perimeter and complete-graph measures. 


chip wire 
length 


com plete-gra ph 
measure 


2.0 


enact | 
prediction 


1.5 
wiretength predi 40 n d 
approximation (arbitrary un ts) 


e Meander factor that specifies, on average, the ratio of the interconnect 
created by the routing tool to the interconnect-length estimate used by the 
placement tool. 

e Another problem is MRST that minimizes total net length may not 
minimize net delay. 
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Interconnect Congestion 


There is no point in minimizing the interconnect length if we create a 


placement that is too congested to route. 


If we use minimum interconnect congestion as an additional placement 


objective, we need some way of measuring it. 
What we are trying to measure is interconnect density 
One measure of interconnect congestion uses the maximum cut line. 


Maximum cut line: Imagine a horizontal or vertical line drawn anywhere 
across a chip or block, The number of interconnects that must cross this 


line is the cut size (the number of interconnects we cut). 


The maximum cut line has the highest cut size. 


191 


rows of standard 


TOA ee : 
PUL AN ТЩЛШШ Т : 
ы у 


TO PO · 
| NSBR ТЕД, : 
ШОО REL TU NAE £T | 


| feedthrough Бип 
ШЕРТ cele feedthrough 


e Interconnett congestion Tor the сей-Ә еа ASIC. 
e(a) Measurement of congestion. 


e(b) An expanded view of flexible block A shows a 
maximum cut line. 


cn 
тай 
Il 
= 
m 
Ем 


шоа ш шшш) 


channels 


«192 


Interconnect Delay 


Many placement tools minimize estimated interconnect length or 
interconnect congestion as objectives. 


The problem with this approach is that a logic cell may be placed a 
long way from another logic cell to which it has just one 
connection. However, the one long connection may be critical as 
far as timing delay is concerned. 


As technology is scaled, interconnection delays become larger 
relative to circuit delays and this problem gets worse. 


The minimum-length Steiner tree does по necessarily 
correspond to the interconnect path that minimizes delay. 
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Interconnect Delay 


Half-perimeter measure 


In the placement phase typically a simple interconnect length 
approximation is taken to this minimum-delay path . 


Timing-driven placement 


In timing-driven placement, estimate delay for every net for every trial 
placement, possibly for hundreds of thousands of gates. 


Parameters to estimate Interconnect Delay 

No information on which layers and how many vias the interconnect will 
use or how wide it will be. Some tools allow us to include estimates for these 
parameters. 

Specification of metal usage, the percentage of routing on the different 
layers to expect from the router, allows the placement tool to estimate RC 
values and delays—and thus minimize delay. 


°194 


Placement Algorithms 


There are two classes of placement algorithms used in CAD tools: 


» Constructive placement - uses a set of rules to arrive at a constructed 
placement. 


Example :min-cut algorithm. Eigenvalue method. 
» Iterative placement improvement. 


As in system partitioning, placement usually starts with a constructed solution and then 
improves it using an iterative algorithm. 


Min-cut placement method 
- uses successive application of partitioning. The steps are, 
- Cut the placement area into two pieces. 
- Swap the logic cells to minimize the cut cost. 


- Repeat the process from step 1, cutting smaller pieces until all the logic 
cells are placed 


Usually we divide the placement area into bins. 

The size of a bin can vary, from a bin size equal to the base cell (for a gate array) to a 
bin size that would hold several logic cells. 

We can start with a large bin size, to get a rough placement, and then reduce the 
in size to get a final placement. 
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*Min-cut placement. (a) Divide the chip into bins using a grid. 
*(b) Merge all connections to the center of each bin. 


*(c) Make a cut and swap logic cells between bins to minimize the cost of the cut. 


e(d) Take the cut pieces and throw out all the edges that are not inside the piece. 


*(e) Repeat the process with a new cut and continue until we reach the individual bins. 
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Placement Algorithms (contd.,) 


The eigenvalue placement algorithm uses the cost matrix or weighted connectivity matrix (eigen 


value methods are also known as spectral methods ). 


The measure we use is a cost function f that we shall minimize, given by , 


i 73248; (1) 


where C = [ с; 115 the (possibly weighted) connectivity matrix, and d jjis the Euclidean distance 
between the centers of logic cell i and logic cell j . 


Since we are going to minimize a cost function that is the square of the distance between logic cells, 


these methods are also known as quadratic placement methods. 


This type of cost function leads to a simple mathematical solution. We can rewrite the cost function f in 


matrix form: f= i Ус, (x, = x f ES (y, = y,} 


i, j=l 
f =x" Bx y! By 
B is a symmetric matrix, the disconnection matrix (also called the Laplacian). 
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3 


where, й ij 


We can simplify the problem by noticing that it is symmetric in the x - and y -coordinates. 


Let us solve the simpler problem of minimizing the cost function for the placement of logic 
cells along just the x - axis first. 


We can then apply this solution to the more general two-dimensional placement problem. 


Before we solve this simpler problem, we introduce a constraint that the coordinates of the logic 
cells must correspond to valid positions (the cells do not overlap and they are placed on-grid). 


We make another simplifying assumption that all logic cells are the same size and we must place 
them in fixed positions. 


We can define a vector p consisting of the valid positions: 
р=[р,.рз---р,| (4) 
For a valid placement the x -coordinates of the logic cells, 
Ше ONE NEN (5) 
must be a permutation of the fixed positions, р. We can show that requiring the logic cells to be in 
fixed positions in this way leads to a series of n equations restricting the values of the logic cell 


coordinates .If we impose all of these constraint equations the problem becomes very complex. 
Instead we choose just one of the equations: 


n n 

F 2 
х=) р, 
i=l i=l 


(6) 
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Simplifying the problem in this way will lead to an approximate solution 


to the placement problem. 


We can write this single constraint on the x -coordinates in matrix form: 


T 
x x=P 


P Y р; 
i=l 


where P is a constant. 
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We can now summarize the formulation of the problem, with the simplifications that we have made, for a 
one-dimensional solution. We must minimize a cost function, g, where 


T 
g =x Вх (8) 
subject to the constraint: 


хх-р (9) 


This is а standard problem that we сап solve using a Lagrangian multiplier: 


A = х Bx- Ax" x = p| (10) 


To find the value of x that minimizes g we differentiate partially with respect to x and set the 
result equal to zero. We get the following equation: 


[B—Al|x =0 
(11) 
This last equation is called the characteristic equation for the disconnection matrix B and occurs 
frequently in matrix algebra (this l has nothing to do with scaling). 
The solutions to this equation are the eigenvectors and eigenvalues of B . 


Multiplying Eq.(11) by x T we get: T T 
Ax х=х Bx 4.8 
P 
However, since we imposed the constraint x T x = P and x T Bx = g , then 


The eigenvectors of the disconnection matrix B are the solutions to our placement problem. 
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IB-Ar|x-0 (11) 


This last equation is called the characteristic equation for 
the disconnection matrix B and occurs frequently in matrix 


algebra (this | has nothing to do with scaling). 


The solutions to this equation are the eigenvectors and 


eigenvalues of B. 
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Eigenvalue placement problem 
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Iterative Placement Improvement 


An iterative placement improvement algorithm takes an existing placement and 
tries to improve it by moving the logic cells. There are two parts to the 
algorithm: 

- The selection criteria that decides which logic cells to try moving. 

- The measurement criteria that decides whether to move the selected cells. 


There are several interchange or iterative exchange methods that differ in their 


selection and measurement criteria: 
- Pair wise interchange, 
- force-directed interchange, 
– force-directed relaxation, and 
- force-directed pair wise relaxation. 


All of these methods usually consider only pairs of logic cells to be exchanged. 


A source logic cell is picked for trial exchange with a destination logic cell. 
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Iterative Placement Improvement 
(contd.,) 


The pair wise-interchange algorithm 


Select the source logic cell at random. 

Try all the other logic cells in turn as the destination logic cell. 

Use any of the measurement methods we have discussed to decide on 
whether to accept the interchange. 

The process repeats from step 1, selecting each logic cell in turn as a 
source logic cell. 


Neighborhood exchange algorithm 


Modification to pairwise interchange that considers only destination 
logic cells in a neighborhood —cells within a certain distance, e, of 
the source logic cell. 

Limiting the search area for the destination logic cell to the e - 
neighborhood reduces the search time. 


Source thal destination module neighborhood of — 2-neighborhood of 


module m 3 swap "2 module 1 
? 3 4 г. 4 ial 4 
5 |е 17 [91 fsle o E "ERE 
a dig 1 10.11712 ur 11 
148 1516 1314 1516 13 141516 13 141516 


\=2 swap 


[a] (6) {8 (4) 


ePair-wise Interchange. 

e (a) Swapping the source logic cell with a destination logic cell in pairwise 
interchange. 

e(b) Sometimes we have to swap more than two logic cells at a time to reach an 
optimum placement, but this is expensive in computation time. Limiting the search 
to neighborhoods reduces the search time. Logic cells within a distance e of a logic 
cell form an e-neighborhood. 


e(c) A one-neighborhood. 
e(d) A two-neighborhood. 


Iterative Placement Improvement 
- Force-directed placement method 


e Imagine connecting all the logic cells which are going to place are 
connected through identical springs. 


eThe number of springs is equal to the number of connections between logic 
cells. 


eThe effect of the springs is to pull connected logic cells together. 
eThe more highly connected the logic cells, the stronger the pull of the springs. 


е The force on a logic cell i due to logic cell j is given by Hooke's law , which 
says the force of a spring is proportional to its extension: 


- The vector component x; is directed from the center of logic cell i to the center of 
logic cell j . 

- The vector magnitude is calculated as either the Euclidean or Manhattan distance 
between the logic cell centers. 

- The cj form the connectivity or cost matrix (the matrix element cj is the number of 


connections between logic cell i and logic cell j ). n 
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eFIGURE 16.27 Forcé-directed placement. 
e (a) P network with Hine logic del Ils. 
• (b) We make a grid (one logic cell per bin). 


e (c) Forces are calculated as if springs were attached to 
the centers of each logic cell for each connection.The 

two nets connecting logic cells A and I correspond to 

two springs. 

e(d) The forces are proportional to the spring an 
extensions. 


Iterative Placement Improvement 
(contd.,) 


Force-directed placement algorithms: 


» The force-directed interchange algorithm uses the force vector to select a 
pair of logic cells to swap. 


> The force-directed relaxation a chain of logic cells is moved. 
> Theforce-directed pairwise relaxation algorithm swaps one pair of logic 
cells at a time. 


Force-directed solution minimize the energy of the system, corresponding 
to minimizing the sum of the squares of the distances separating logic 
cells. 


Force-directed placement algorithms thus also use a quadratic cost function. 
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ФЕІСОВЕ 16.28 Force-directed iterative 


placement improvement. 


e(a) Force-directed interchange. 


e(b) Force-directed relaxation. 


e(c) Force-directed pairwise relaxation. 
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Placement Using Simulated Annealing 


Applying simulated annealing to placement, the algorithm is as follows: 


- Select logic cells for a trial interchange, usually at random. 

- Evaluate the objective function E for the new placement. 

- If AE is negative or zero, then exchange the logic cells. 

- If AEis positive, then exchange the logic cells with a probability of 
exp(-AE/T). 

- Go back to step 1 for a fixed number of times, and then lower the 
temperature T according to a cooling schedule: T, ,, - 0.9 T, , for 
example. 


Comparison of Placement Algorithms 


- Min-cut based constructive placement is faster than simulated annealing. 

-Simulated annealing is capable of giving better results at the expense of 
long computer run times. 

-The iterative improvement methods that described earlier are capable of 


giving results as good as simulated annealing, but they use more complex 


algorithms. A 


Timing-Driven Placement Methods 


Minimizing delay is becoming more and more important as a placement 
objective. 
There are two main approaches: 


» net based 


» path based. 


We know that we can use net weights in our algorithms. 


° The net weights might then be the number of times each net appears in 
this list. 
. The problem is to calculate the weights. 


One method finds the n most critical paths. 
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Timing-Driven Placement Methods 


Find the net weights uses the zero-slack algorithm | Hauge et al.,1987]. 


Figure 16.29 shows how this works (all times are in nanoseconds). 


Primary inputs at which the arrival times (this is the original definition, 


some people use the term actual times ) of each signal are known. 


Required times for the primary outputs —the points in time at which the 


signals to be valid. 


Work forward from the primary inputs and backward from the primary 


outputs to determine arrival and required times at each input pin for each net. 


The difference between the required and arrival times at each input pin is the 


slack time (the time we have to spare). 
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Timing-Driven Placement Methods 


е The zero-slack algorithm - adds delay to each net until the slacks 


are zero, as shown in Figure 16.29 (b). 


e The netdelays can then be converted to weights or constraints in 


the placement. 


e Assumed that all the gates on a net switch at the same time so that the 
net delay can be placed at the output of the gate driving the net—a 
rather poor timing model but the best we can use without any routing 


information. 


214 


Timing-Driven Placement Methods 
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Timing-Driven Placement Methods 


Advantage 

An important point to remember is that adjusting the net weight, even for 
every net on a chip, does not theoretically make the placement algorithms any 
more complex—we have to deal with the numbers anyway. It does not matter 
whether the net weight is 1 or 6.6, for example. 

Disadvantage 

The practical problem, however, is getting the weight information for 
each net (usually in the form of timing constraints) from a synthesis tool 
or timing verifier. 

These files can easily be hundreds of megabytes in size. 


With the zero-slack algorithm we simplify but overconstrain the problem. 
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Solution 


Deal with paths such as the critical path shown in Figure 16.29 (a) and not just 
nets. 


Disadvantage of Path-based method 
They are complex and not all commercial tools have this capability 


Path delays between gates can not be predicted with only placement 
information. 

Because we are using simple approximations to the total net length (such as 
the half-perimeter measure) and then use this to estimate a net delay (the 
same to each pin on a net). 

It is not until the routing step that we can make accurate estimates of the 
actual interconnect delays. 
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Physical design flow 
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UNIT IV 


ROUTING 
Dr.K.Kalyani 
AP, ECE, 
TCE. 
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Introduction 


* Once the designer has 
% Floorplanned a chip 
** The logic cells within the flexible blocks have been placed 
% Time to make the connections by routing the chip. 
% This is still a hard problem that is made easier by dividing it into smaller 
problems. 


* Routing is usually split into 
* Global routing followed by detailed routing . 
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"collection of standard cells with no room set aside yet.£or 
routing. 


The starting point of floorplaning and 
placement steps for the viterbi decoder 


Small boxes that look like bricks - outlines of the standard cells. 


Largest standard cells, at the bottom of the display (labeled dfctnb) 
- 188 D flipflops. 


* symbols -drawing origins of the standard cells—for the D flip-flops 
they are shifted to the left and below the logic cell bottom left-hand 


corner. 
Large box surrounding all the logic cells - estimated chip size. 


(This is a screen shot from Cadence Cell Ensemble.) 
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*FIGURE 17.1 The core of the Viterbi decoder chip after placement (a screen shot from 
Cadence Cell Ensemble) :225 
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*FIGURE 17.2 The core of the Viterbi decoder chip after the completion of global and detailed 
routing (a screen shot from Cadence Cell Ensemble). This chip uses two-level metal. Although you 
cannot see the difference, m1 runs in the horizontal direction and m2 in the vertical direction. 


Global Routing 


The details of global routing differ slightly between 


- cell-based ASICs, gate arrays, and FPGAs, but the principles are the 


same. 
A global router does not make any connections, it just plans them. 


Global route the whole chip (or large pieces if it is a large chip) before detail 


routing the whole chip (or the pieces). 
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Goals and Objectives 


е Inputtorouting 


- Floorplan that includes the locations of all the fixed and flexible blocks; 
- Placement information for flexible blocks; 
е Locations of all the logic cells. 


e Goal of global routing 


- To provide complete instructions to the detailed router on where to 
route every net. 


е Objectives of global routing 


- Minimize the total interconnect length. 


- Maximize the probability that the detailed router can complete 
the routing. 


- Minimize the critical path delay. 
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Measurement of Interconnect Delay 


e After placement, the logic cell positions are fixed and the global router can afford to use 
better estimates of the interconnect delay. 

е То illustrate one method, we shall use the Elmore constant to estimate the interconnect 
delay for the circuit shown in Figure 17.3. 


pull-down resistance of 
resistance of interconnect 
inverter А segments 


(с) 


«FIGURE 17.3 Measuring the delay of a net. (а) A simple circuit with an inverter А driving a 
net with a fanout of two. Voltages V 1, V2, V3, and V 4 are the voltages at intermediate 
points along the net. (b) The layout showing the net segments (pieces of interconnect). 

(с) The RC model with each segment replaced by a capacitance and resistance. The ideal +229 
switch and pull-down resistance R pd model the inverter A. 


The problem is to find the voltages at the inputs to logic cells B and C taking 
into account the parasitic resistance and capacitance of the metal interconnect. 
Figure 17.3 (c) models logic cell A as an ideal switch with a pull-down 


resistance equal to R pd and models the metal interconnect using resistors and 
capacitors for each segment of the interconnect. 


eThe Elmore constant for node 4 (labeled V , ) in the network 
shown in Figure 17.3 (c) i 
4 


= la (17.1) 
bL 


Rua C + АС + Ва Сз+ Аа. Сд, 


ewhere, Ris = R pa + К 4 (resistancetoV, shared Бу 


node 1 and 4) 
Кд = НЕН, 
R 34 = К,а%К(|%К; 
R 44 = RQ*R,*R4*R, 230 


In Eq. 17.2 notice that К 24 = R pd + R 1 (and not R pd+R1+R2 ) because 
R 1 is the resistance to V 0 (ground) shared by node 2 and node 4. 


Suppose we have the following parameters (from the generic 0.5 m m CMOS 
process, G5) for the layout shown in Figure 17.3 (b): 


m2 resistance is 50 m 0 /square. 

m2 capacitance (for a minimum-width line) is 0.2 pFmm -1. 
АХ inverter delay is 0.02 ns + 0.5 Сп (C, is in picofarads). 
Delay is measured using 0.35/0.65 output trip points. 

m2 minimum width is З à = 0.9 wm. 

1X inverter input capacitance is 0.02 pF (a standard load). 


First we need to find the pull-down resistance, Кра , of the 4X inverter. І we 
model the gate with a linear pull-down resistor, Кра, driving a load C, , the 
output waveform is exp - t /( C; R4 ) (normalized to 1V). 


The output reaches 63 percent of its final value when t = C; Ка, because 

exp (-1) = 0.63. Then,because the delay is measured with a 0.65 trip point, the 
constant 0.5 nspF -1 0.5kW is very close to the equivalent pull-down 
resistance. Thus, Rpa - 5000. 
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(0.1 mm) (50 = 10-3 9) 


Кі-Кз = --------------- = 80 
0.9 u m 
(1 mm) (50 = 10-3 Q) 
Йу келә IE = 560 
0.9 u m 
(2 mm) (50 = 10-3 Q) 
Ni m s — = 112Q 
0.9 u m 
Cı = (0.1 mm) (0.2 = pFmm-!) = 0.02 pF 
Сә = (0.1 тт) (0.2 = рЕтт-1) +0.02рЕ = 0.04 pF 
Сз = (1 тт) (0.2 = рЕтт -!) = 0.2 pF 
Са = (2mm) (0.2 = pFmm -1 ) + 0.02 pF = 0.42 pF 


*m2 resistance is 50 т О square. 

*m2 capacitance (for a minimum-width 
line) is 0.2 pFmm -1. 

“АХ inverter delay is 0.02 ns + 0.5 Сп ( 
C, is in picofarads). 

*Delay is measured using 0.35/0.65 
output trip points. 

*m2 minimum width is 3 A = 0.9 um. 
*1X inverter input capacitance is 0.02 
pF (a standard load). 
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“ Rz 5260 
“ R,-560 

“ В,-1120 

e C1-0.02 pF 
e C2-0.04 pF 
e C3-0.2 pF 

e C4-0.42 pF 


Now we can calculate the path resistance, Rẹ, values (notice that R,; = Ry): 
R,,=5000+60=5060 
R», -5000-60-5060 
R3, =500 Q +60 + 56 0 =562 Q 
К,,-5000-60-56041120-6740 (17.5) 


Finally, we can calculate Elmore’s constants for node 4 and node 2 as follows: 


бы 


В,С + АС, + К. С. + ВАС (17.6) 
(506)(0.02) + (506)(0.04) 

+ (562)(0.2) + (674)(0.42) 

425 рѕ. 

В,С, +В С, + А.С: + В,С, (17.7) 
(А+ В: (Су + Сз + Сд) 
ВЫНА О 

(500 + 6 + 6)(0.04) 


eA lumped-delay model neglects the effects of interconnect 
resistance and simply sums all the node capacitances (the 
lumped capacitance ) as follows: 

? Cp -Ryg(C4*C5*C4*C,) (17.8) 

. = (500) (0.02 + 0.04 + 0.2 + 0.42) 
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Measurement of delay 


The delay of the inverter can be assigned as follows: 


20 ps (the intrinsic delay, 0.02 ns, due to the cell output 


capacitance), 


340 ps (due to the pull-down resistance and the output 


capacitance), 
4 ps (due to the interconnect from А to B), (6, бу) 


85 ps (due to the interconnect from A to C) (C5, бу). 


Measurement of Interconnect Delay (contd.,) 


e Even using the Elmore constant we still made the following assumptions in 
estimating the path delays: 

« Astep-function waveform drives the net. 

е The delay is measured from when the gate input changes. 

е The delay is equal to the time constant of an exponential waveform 
that approximates the actual output waveform. 

« The interconnect is modeled by discrete resistance and capacitance 
elements. 


е The global router could use more sophisticated estimates that remove some 
of these assumptions, but there is a limit to the accuracy with which delay 
can be estimated during global routing 


е When the global router attempts to minimize interconnect delay, there is 
an important difference between a path and a net. 


е The path that minimizes the delay between two terminals on a net is not 
necessarily the same as the path that minimizes the total path length of 
the net. 


“236 


Global Routing Methods 


Many of the methods used in global routing are based on the solutions to the 
tree on a graph problem. 


sequential routing : 

One approach to global routing takes each net in turn and calculates 
the shortest path using tree on graph algorithms—with the added 
restriction of using the available channels. 


Disadvantage: 


As a sequential routing algorithm proceeds, some channels will 
become more congested since they hold more interconnects than 
others. 


In the case of FPGAs and channeled gate arrays, the channels have a 
fixed channel capacity and can only hold a certain number of 
interconnects. 


Global Routing Methods (contd.,) 


There are two different ways that a global router normally handles this problem. 
1.Order independent Routing 
2.0rder dependent Routing 


Order-independent routing, a global router proceeds by routing each net, ignoring 
how crowded the channels are. Whether a particular net is processed first or last does 
not matter, the channel assignment will be the same. 


Order-independent routing, after all the interconnects are assigned to channels, the 
global router returns to those channels that are the most crowded and reassigns some 
interconnects to other, less crowded, channels. 


order dependent :A global router can consider the number of interconnects already 
placed in various channels as it proceeds. In this case the global routing is order 
dependent —the routing is still sequential, but now the order of processing the nets will 
affect the results. 


Iterative improvement or simulated annealing may be applied to the solutions found 
from both order-dependent and order-independent algorithms. 
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Global Routing Methods (contd.,) 


Hierarchical routing handles all nets at a particular level at once. 


Rather than handling all of the nets on the chip at the same time, the global- 
routing problem is made more tractable by dividing the chip area into levels of 
hierarchy. 


By considering only one level of hierarchy at a time the size of the problem is 
reduced at each level. 


There are two ways to traverse the levels of hierarchy. 


e top-down approach :- Starting at the whole chip, or highest level, and 
proceeding down to the logic cells is the. 


е The bottom-up approach starts at the lowest level of hierarchy and globally 
routes the smallest areas first. 
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Global Routing 


There are two types of areas to global route: 
- between blocks 


- inside the flexible blocks 
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Global Routing Between Blocks 


(а) (b) d 
eFIGURE 17.4 Global routing for a cell-based ASIC ае) 
as а graph problem. (а) А cell-based ASIC with numbered 
channels. (b) The channels form the edges of a graph. (c) The 
channel-intersection graph. Each channel corresponds to an 


edge on a graph whose weight corresponds to the channel 
length. "M 


Global Routing Between Blocks 
( contd.,) 
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(b) (c) 


«FIGURE 17.5 Finding paths in global routing. (a) A cell-based ASIC showing a single net 
with a fanout of four (five terminals). We have to order the numbered channels to complete 
the interconnect path for terminals A1 through F1. (b) The terminals are projected to the 
center of the nearest channel, forming a graph. A minimum-length tree for the net that uses 
the channels and takes into account the channel capacities. (c) The minimum-length tree 
does not necessarily correspond to minimum delay. If we wish to minimize the delay 
from terminal A1 to D1, a different tree might be better. :242 


Global Routing Between Blocks 
( contd.,) 


Global routing is very similar for cell-based ASICs and gate arrays, but there 
is a very important difference between the types of channels in these 
ASICs. 


In channeled gate-arrays and FPGAs the size, number, and location of 
channels are fixed. 


Advantage - the global router can allocate as many interconnects to each 
channel as it likes, since that space is committed anyway. 


Disadvantage - there is a maximum number of interconnects that each 
channel can hold. 


If the global router needs more room, even in just one channel on the whole 
chip, the designer has to repeat the placement-and-routing steps and try again 
(or use a bigger chip). 
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Global Routing Inside Flexible Blocks 
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eFIGURE 17.6 Gate-array global routing. (a) A small gate array. (b) An enlarged view of the routing. The 
top channel uses three rows of gate-array base cells; the other channels use only one. (c) A further 
enlarged view showing how the routing in the channels connects to the logic cells. (d) One ofthe logic 
cells, an inverter. (e) There are seven horizontal wiring tracks available in one row of gate-array base „дд 


cells—the channel capacity is thus 7 


Global Routing Inside Flexible Blocks (contd.,) 


input output feedthrough 


eFIGURE 17.7 The gate-àrray inverter from Figure 17.6 
d. (a) An oxide-isolated gate-array base cell, showing 
the diffusion and polysilicon layers. (b) The metal and 
contact layers for the inverter in a 2LM (two-level 
metal) process. (с) The router's view of the cell in а 3LM?^? 
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Global Routing Inside Flexible Blocks 
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FIGURE 17.8 Global routing a gate array. (a) A single global-routing cell (GRC or routing bin) containing 2-by-4 
gate-array base cells. For this choice of routing bin the maximum horizontal track capacity is 14, the maximum 
vertical track capacity is 12. The routing bin labeled C3 contains three logic cells, two of which have feedthroughs 
marked 'f'. This results in the edge capacities shown. (b) A view of the top left-hand corner of the gate array 
showing 28 routing bins. The global router uses the edge capacities to find a sequence of routing bins to connect 
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Timing-DrivenMethods 


As in timing-driven placement, there are two main approaches to timing-driven routing: 
- net-based and path-based. 


Path-based methods are more sophisticated. 


For example, if there is a critical path from logic cell A to B to C, the global router 
may increase the delay due to the interconnect between logic cells A and B if it 
can reduce the delay between logic cells B and C. 


Placement and global routing tools may or may not use the same algorithm to 
estimate net delay. If these tools are from different companies, the algorithms are 
probably different. 


The algorithms must be compatible, however. There is no use performing placement to 
minimize predicted delay if the global router uses completely different 
measurement methods. 


Companies that produce floorplanning and placement tools make sure that the 
output is compatible with different routing tools—often to the extent of using different 
algorithms to target different routers. 
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Back-annotation 


The global router can give not just an estimate of the total net 
length (which was all we knew at the placement stage), but the 
resistance and capacitance of each path in each net. This RC 
information is used to calculate net delays. 


Back-annotate this net delay information 
- tothe synthesis tool for in-place optimization or 
- toatiming verifier to make sure there are no timing surprises. 


Differences in timing predictions at this point arise due to the 
different ways in which the placement algorithms estimate the 
paths and the way the global router actually builds the paths. 
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Detailed Routing 


Goal: 


The goal of detailed routing is to complete all the connections between logic 
cells. 


Objectives: 


The most common objective is to minimize one or more of the following: 
- The total interconnect length and area 
- Thenumber oflayer changes that the connections have to make 
- The delay of critical paths 


Minimizing the number of layer changes corresponds to minimizing the 
number of vias that add parasitic resistance and capacitance to a connection. 
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Measurement of Channel Density 
Definition of Local and Global channel density 


4% local density = 3 local density 
local density = 2 = global density or 
local density z 1 channel density = 4 


e Maximum local density of channel is Global density 
e Channel density is less than or equal to Channel capacity. 
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Left-edge algorithm 


The left-edge algorithm ( LEA ) is the basis for several routing algorithms [ 
Hashimoto and Stevens, 1971]. 


The LEA applies to two-layer channel routing, using one layer for the trunks and the 
other layer for the branches. 


For example, m1 may be used in the horizontal direction and m2 in the vertical 
direction. 


The LEA proceeds as follows: 


1. Sort the nets according to the leftmost edges of the net's horizontal 
segment. 


2. Assign the first net on the list to the first free track. 
3. Assign the next net on the list, which will fit, to the track. 


4. Repeat this process from step 3 until no more nets will fit in the current 
track. 


5. Repeat steps 2-4 until all nets have been assigned to tracks. 
6. Connect the net segments to the top and bottom of the channel. 
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Left-edge algorithm 


бедіп ents sorted 
Po ad by their left edge. 


2 
3 
(а) 4 - A Net 6 has 3 term inals. 
Left edge of segment 7 6 
connects to top 7 8 
of channel. 9 
10 
Left edge of segment 6 
connects to bottom 
of channel. 
7 
3 
(b) 2 5 8 10 


1 4 6 3 
bane E 


Segm ents assigned to tracks by their left edges. 


(c) 
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Left-edge algorithm 


Segm ents sorted 
| СЛ briheirle edge. 
2 
3 
(a) 4 - F 
Left edge of segment 7 6 
connects to top 7 в 
of channel. 9 
10 
Left edge of segment Б 
connects to bottom 
of channel. 
7 
3 
(b) 2 5 8 10 
1 om 4 Б 3 
Segm ents assigned to tracks by their left edges. 
00 3 0 2 5 4 7 5 8 6 3 10 10 7 m2 
s—5—,—s9—s—t—p X Y E E E P a zi 
(c) vial 
e p- 
42. 
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Constraints and Routing Graphs 


* [woterminals that are in the same column in a channel create a 
vertical constraint . 


* Overlap between the trunks of nets is called horizontal constraint. 


6 The set of 4 nodes, 
(3, 5,5, 7), is the 
largest com pletely 

(b) connected loop . (c) 
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Dog-Leg router 


i i 2 1 ‚ШИ oe - 
2 dogleg—m ore 
" than one trunk 
ml val | ў реғ пеї 
E us d 2 on у 
(a) (b) (о) 


• Adogleg router removes the restriction that each net can use only опе 


track or trunk. 
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Area Routing Algorithm- Lee-Maze algorithm 
[For general shaped areas] 


Roo pm «n 
Roo гә 0 m 


Finds a path from source (X) to target (Y) by emitting a wave from both 
the source and the target at the same time. 

Successive outward moves are marked in each bin. 

Once the target is reached, the path is found by backtracking (if there is a 
choice of bins with equal labeled values, choose the bin that avoids changing 


direction). (The original form of the Lee algorithm uses a single wave.) 


Hightower or line search-Area routing algorithm 
[For general shaped areas] 


- 2 
escape line target 


intersection 
ofescape 
lines 


• 1. Extend lines from both the source and target toward each 


other. 


e2. When an extended line, known as an escape line , meets 
an obstacle, choose a point on the escape line from which to 
project another escape line at right angles to the old one. This 


point is the escape point. 


Special routing- CLK routing 


Gate arrays normally use a clock spine (a regular grid), eliminating the need 
for special routing. 


The clock distribution grid is designed at the same time as the gate-array 
base to ensure a minimum clock skew and minimum clock latency— given 
power dissipation and clock buffer area limitations. 


Cell-based ASICs may use either a clock spine, a clock tree, or a hybrid 
approach. 


Figure shows how a clock router may minimize clock skew in a clock spine 
by making the path lengths, and thus net delays, to every leaf node equal— 
using jogs in the interconnect paths if necessary. 


More sophisticated clock routers perform  clocktree synthesis 
(automatically choosing the depth and structure of the clock tree) and 
clock-buffer insertion (equalizing the delay to the leaf nodes by balancing 
interconnect delays and buffer delays). = 


Special routing- CLK routing 


CL 


(а) (b) 


FIGURE: Clock routing. (a) A clock network for the cellbased ASIC 
(b) Equalizing the interconnect segments between CLK and all 
destinations (by including jogs if necessary) minimizes clock 


skew. 
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Special routing- Power routing 


Power bus width 


Each of the power buses has to be sized according to the current it 


will carry. 


Too much current in a power bus can lead to a failure through a 


mechanism known as electromigration. 


The required power-bus widths can be estimated automatically 
from library information, from a separate power simulation tool, or 
by entering the power-bus widths to the routing software by 


hand. 


Many routers use a default power-bus width so that it is quite easy 


to complete routing of an ASIC without even knowing about this 
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problem. 


Special routing- Power routing 


Gate-Array ASIC 


Gate arrays normally use a regular power grid as part of 
the gate-array base. 


The gate-array logic cells contain two fixed-width power 
buses inside the cell, running horizontally on m1. 


The horizontal m1 power buses are then strapped in a 
vertical direction by m2 buses, which run vertically 
across the chip. 


Special routing- Power routing 


Cell-based ASIC 


Standard cells are constructed in a similar fashion to gate-array cells, with 
power buses running horizontally in m1 at the top and bottom of each 
cell. 

A row of standard cells uses end-cap cells that connect to the VDD and VSS 
power buses placed by the power router. 

Power routing of cell-based ASICs may include the option to include 
vertical m2 straps at a specified intervals. 

In a three-level metal process, power routing is similar to two-level metal 
ASICs. Power buses inside the logic cells are still normally run on m1. 
Using HVH routing it would be possible to run the power buses on m3 and 


drop vias all the way down to m1 when power is required inthe cells. >262 


Circuit Extraction 


After detailed routing is complete, the exact length and position of each 
interconnect for every net is known. 
Now the parasitic capacitance and resistance associated with each 


interconnect, via, and contact can be calculated. 


This data is generated by a circuit-extraction tool in one of the formats. 


standard parasitic format ( SPF ) 


The standard parasitic format ( SPF ) describes interconnect delay 


and loading due to parasitic resistance and capacitance. 


There are three different forms of SPF: 


- Two ofthem ( regular SPF and reduced SPF ) contain the same 
information, but in different formats, and model the behavior of 


interconnect; 


- Third form of SPF ( detailed SPF ) describes the actual parasitic 
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Circuit Extraction 


The load at the output of gate A is represented by one of three models: lumped-C, 
lumped- RC, or PI segment. 


vis 
— 


lum pe d-C Ie 


(c) 


Ф 
A1 [et т (9) 
Wario a S TY ме 
мал I^ "Ша ст.“ 
РІ segment "Ге Та 


b) (e) 


Figure: The regular and reduced standard parasitic format (SPF) models for 
interconnect. (a) An example of an interconnect network with fanout. The driving-point 
admittance of the interconnect network is Y ( s ). (b) The SPF model of the interconnect. 
(c) The lumped-capacitance interconnect model. (d) The lumped-RC interconnect 
model. (e) The PI segment interconnect model. 


The values of C, R, C 1, and C2 are calculated sothatY 1(s), Y2(s), and Y 3 (5 ) are the 


first-, second-, and third-order Taylor-series approximations to Y ( s ). Ей 


Circuit Extraction 


The key features of regular and reduced SPF are as follows: 


The loading effect of a net as seen by the driving gate is represented by 
choosing one of three different RC networks: lumped-C, lumped-RC, or PI 


segment (selected when generating the SPF) [ O'Brien and Savarino, 1989]. 


The pin-to-pin delays of each path in the net are modeled by a simple RC 
delay (one for each path). This can be the Elmore constant for each path, but it 
need not be. 


The reduced SPF ( RSPF) contains the same information as regular SPF, 
but uses the SPICE format. 


Detailed SPF: 


The detailed SPF ( DSPF) shows the resistance and capacitance of each 
segment in a net, again in a SPICE format. There are no models or 
assumptions on calculating the net delays in this format. 


Design-Rule Check ( DRC ) 


ASIC designers perform two major checks before fabrication. 
DRC: 


The first check is a design-rule check ( DRC ) to ensure that nothing 
has gone wrong in the process of assembling the logic cells and 
routing. 


The DRC may be performed at two levels. 


Phantom-Level DRC: 


The first level of DRC is a phantom-level DRC , which checks for 
shorts, spacing violations, or other design-rule problems between 
logic cells. 


This is principally a check of the detailed router. 


If the real library-cell layouts (sometimes called hard layout ) can be 


accessed, we can instantiate the phantom cells and perform a 
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second-level DRC at the transistor level. 


Design-Rule Check ( DRC ) 


Dracula check: 

This is principally a check of the correctness of the library cells. 
Normally the ASIC vendor will perform this check using its own 
software as a type of incoming inspection. 

The Cadence Dracula software is one de facto standard in this area, 
and you will often hear reference to a Dracula deck that consists of 
the Dracula code describing an ASIC vendor's design rules. 

Sometimes ASIC vendors will give their Dracula decks to customers so 


that the customers can perform the DRCs themselves. 


Design-Rule Check ( DRC ) 


Layout Vs Schematic check: 

To ensure that what is about to be committed to silicon 
is what is really wanted. 

An electrical schematic is extracted from the physical 
layout and compared to the netlist. 

This closes a loop between the logical and physical design 
processes and ensures that both are the same. 

The LVS check is not as straightforward as it may sound, 


however. oes 


Design-Rule Check ( DRC ) 


Problems in LVS check: 
The first problem is transistor-level netlist for a large ASIC forms an 


enormous graph. 


LVS software essentially has to match this graph against a reference 


graph that describes the design. 


Ensuring that every node corresponds exactly to a corresponding 


element in the schematic (or HDL code) is a very difficult task. 


The first step is normally to match certain key nodes (such as the 
power supplies, inputs, and outputs), but the process can very quickly 
become bogged down in the thousands of mismatch errors that are 


inevitably generated initially. 


Design-Rule Check ( DRC ) 


Problems in LVS check: 


The second problem with an LVS check is creating a true reference. 
The starting point may be HDL code or a schematic. 


Logic synthesis, test insertion, clock-tree synthesis, logical-to-physical 
pad mapping, and several other design steps each modify the 


netlist. 
The reference netlist may not be what we wish to fabricate. 


In this case designers increasingly resort to formal verification that 
extracts a Boolean description of the function of the layout and 


compare that to a known good HDL description. 


